Sample Survey 
Methods and 




WIL,EY PUBLICATIONS 
IN STATISTICS 

Walter A. Shewhart, Editor 


Mathematical Statistics 

HANSEN, HURWITZ, and MADOW - Sample Survey Methods 
and Theory, Volume 11 
DOOB • Stochastic Processes 

RAO • Advanced Statistical Methods in Biometric Research 
KFMPTHORNE • The Design and Analysis of Experiments 
DWYER • Linear Computations 
FISHER • Contributions to Mathematical Statistics 
WALD • Statistical Decision Functions 

FELLER • An Introduction to Probability Theory and Its Applications, Volume I 

WALD • Sequential Analysis 

HOEL > Introduction to Mathematical Statistics 

Applied Statistics 

COCHRAN * Sampling Techniques 
WOLD andJUREEN • Demand Analysis 
HANSEN, HURWITZ, and MADOW • Sample Survey Methods 
and Theory, Volume I 
CLARK • An Introduction to Statistics 
TIPPETT • The Methods of Statistics, Fourth Edition 
ROMIG • SO'lOO Binomial Tables 

GOULDEN • Methods of Statistical Analysis, Second Edition 

HALD • Statistical Theory with Engineering Applications 

HALD • Statistical Tables and Formulas 

YOUDEN • Statistical Methods for Chemists 

MUDGETT • Index Numbers 

TIPPETF • Technological Applications of Statistics 

DEMING • Some Theory of Sampling 

COCHRAN and COX • Experimental Designs 

RICE * Control Charts 

DODGE and ROMIG • Sampling Inspection Tables 

Related Boobs of Interest to Statisticians 

ALLEN and ELY • International Trade Statistics 

HAUSER and LEONARD * Government Statistics for Business Use 


Sample Survey 
Methods and 


Theory 


VOLUME II • THEORY 


MORRIS H. HANSEN 

Assistant Director fir Statistical Standards 
Bureau of the Census 


WILLIAM N. HURWITZ 

Chief Statistician 
Bureau of the Census 


WILLIAM G. MADOW 

Chairman, Statistical Research Laboratory 
University of Illinois 


New York • John Wiley &L Sons, Inc. 
London • Chapman &L Hall, Limited 



3 //, 2 ! 




Copyright, 1953 

BY 

John Wiley & Sons, Inc. 


All Rights Reserved 

This book or any part thereof must not 
be reproduced in any form without the 
written permission of the publisher. 


Copyright, Canada, 1953, International Copyright, 1953 
John Wiley & Sons, Inc., Proprietors 

All Foreign Rights Reserved 
Reproduction in whole or in part forbidden 


Library of Congress Catalog Card Number: 53-8112 


printed in the united states of AMERICA 




To 


Mildred, Hannah, and Lillian 



Preface 


THIS VOLUME CONTAINS THE FUNDAMENTAL THEORY ON WHICH 
sampling methods are based together with derivations of the formulas 
and proofs of statements made in Volume I. Volume I gives the prin¬ 
ciples and methods of sampling and their applications to various types 
of problems, and states without proof the formulas appropriate to the 
methods presented. The two volumes combined are an attempt to pro¬ 
vide a comprehensive presentation of both sampling theory and practice. 

The first three chapters of this volume present the fundamental the¬ 
orems on probability, expected values, and variances that are needed 
in the development of the sampling theory in the remaining chapters. 
Chapters 4 through 11 contain derivations, proofs, and some extensions 
of theory for the corresponding chapters of Volume I, and provide a 
convenient summary of sampling formulas. These chapters have been 
designed primarily to facilitate reference from Volume I, and therefore 
suffer somewhat in continuity. They do not contain a discussion of 
the application of the results derived; for this the reader is referred to 
Volume 1. Chapter 12 discusses some of the practical implications of 
the treatment of response errors in surveys, and develops a theory for 
the methods described. Applications of recent developments in decision 
theory have not been included. 

Readers desiring only the ability to understand the derivations of 
sampling formulas can apply the theorems of Chapters 2 and 3, without 
proof, as they are introduced in the proofs given in Chapters 4 through 
12. Many will wish to have the fuller command of the methods and 
the ability to extend them that comes through understanding the theory 
in Chapters 2 and 3, where proofs of the theorems are given. 

For the most part, the mathematical background assumed for this 
volume is college algebra, although some calculus is used for a few of 
the proofs. 

For textbook purposes, the following suggestions may serve as guides 
in the organization of courses in which proofs are given. For courses 
with proofs either volume can serve as the text, the choice depending 
oh the emphasis desired; the other can serve as a reference book and 
provide supplemental material for the teacher, A one-year course may 


V!1 


UKNECIE INSTITUTE 

OF technology ubnahy 



PREFACE 


viii 

begin with Chapter 1 of either volume, the development of selected the¬ 
orems from Chapters 2 and 3 of Volume II as indicated in the introduc¬ 
tions to those chapters, followed by Chapters 4 through 6 and selected 
materials from Chapters 8 through 12 of either volume. The appropri¬ 
ate additional theorems of Chapters 2 and 3 may be developed as they 
are needed. For a one-semester course, material from Chapters 1 
through 6 and 11 may be sufficient. Throughout Chapters 4 through 
11 of this volume some of the sections are footnoted “May be deferred.” 
For these, in particular, it may be convenient to read the theorems but 
omit the proofs if only selected materials are to be covered. 

Most of the theory of sample survey design is an immediate conse¬ 
quence of statistical theory that has been developed and extended by 
many persons over a long period of years. No attempt has been made 
to trace or give credit in the text to original sources of sampling theory, 
except for quite recent developments. Over and above the specific 
credits noted in the text we are highly indebted to Miss Blanche Skalak 
and Dr. Margaret Gurney for their assistance in preparing this volume. 
Miss Skalak developed a number of the proofs in Chapters 4 through 
11, wrote up most of the proofs, and reviewed and made numerous help¬ 
ful suggestions on the entire manuscript. Dr. Gurney reviewed the 
manuscript, made many helpful suggestions, and prepared the index. 


May, 1953 


Morris H. Hansen 
William N. Hurwitz 
William G. Madow 


Contents 


CHAPTER 


INTRODUCTION 


1 Introduction and Definitions. 

1. The scope of the theory of sample surveys. 

2 . Definitions of population, element, and list , ' . 

3. Definitions of characteristic, elementary unit, and population of 

analysis. 

4. Definitions of terms such as sample, sampling plan, and sampling 

unit. 

5. Definitions of estimate, sample design, precision, true value, accu¬ 
racy, survey design. 

6 . Why probability methods should be used in selecting samples 


PAGE 

1 

1 

1 

2 

4 

6 

9 


FUNDAMENTAL THEORY 

2 Operations, Events and Probability . 11 

1. Introduction.H 

2 . Summation notation—why we study summation notation 11 

3. The notion of probability. 15 

4. Probability selection methods and the equal probability selection 

method. 15 

5. Product events. Independence. Conditional probability . , 22 

6 . Some theorems on probabilities.26 

7. Some illustrations of the uses of theorems on independent and con¬ 
ditional probability..30 

8 . Methods of achieving probability (or measurable) sampling plans. 

The table of random numbers. 33 

Appendix. Combinations and Permutations.36 

3 Random Variables, Expected Values, Variances, Covariances, and 

Convergence in Probability . 39 

1. Introduction. 39 

2 . Random variables—mathematical expectation. 39 

3. Some theorems on mathematical expectation. 47 

4. Variance, covariance, mean square error, rel-variance, coefficient 

of variation..50 

5. Conditional expectation. 59 

6 . Conditional variance and covariance.63 

7. The Tchebycheff inequality. Convergence in probability. Consis¬ 
tency ...69 

Appendix A. Sums of powers..77 


IX 






X 


CONTENTS 


CHAPTER VAGE 

Appendix B. Moments.81 

Appendix C. Rapidity of approach to limit.85 

DERIVATIONS, PROOFS, AND SOME EXTENSIONS OF THEORY 
FOR CORRESPONDING CHAPTERS OF VOLUME I 

4 Simple Random Sampling.90 

1. Mathematical expectation of the arithmetic mean . ... 90 

2. Variance of the arithmetic mean.92 

3. Covariance and correlation of arithmetic means .... 96 

4. Mathematical expectation of sample variance.98 

5. Rel-variance of estimated variance.99 

6. Rel-variance of estimated standard deviation.102 

7. Condition for approximate rel-variance of estimated standard devia¬ 
tion to be satisfactory.104 

8. Rel-variance of estimated standard deviation for simple random 

sample from binomial distribution.105 

9. Size of sample required to estimate standard deviation of a propor¬ 
tion with a prescribed precision.105 

10. A simple random sample of a population contains a simple random 

sample of any subset of the population.106 

11. Rel-variance of the ratio of two random variables . . . .107 

12. Condition for the approximate standard deviation of estimated ratio 

to be satisfactory.309 

13. Variance of ratio of random variables from simple random sample 111 

14. An approximation to the bias of ratio estimate . . .112 

15. Bias of ratio estimate relative to standard deviation , . . .113 

16. Conditions for sample ratio to be unbiased estimate of population 

ratio.114 

17. Variance of average and total for subset of population . . .114 

18. Rel-variance of estimated variance of ratio estimate . . . .117 

19. Condition for ratio estimate to have smaller rel-variance than sim¬ 
ple unbiased estimate.119 

20. Consistent estimates of averages and variances . . . .120 

21. Functions of random variables which are consistent estimates of the 

same function of population characteristics.120 

5 Stratified Simple Random Sampling.121 

1. Expected value, variance, covariance, and correlation of unbiased 

estimates.1^1 

2. Variance of ratio estimate.124 

3. Variance of weighted average of ratios.125 

4. Comparison of biases of two ratio estimates, using proportionate 

stratified sampling.126 

5. Difference between the variances of two ratio estimates . . . 128 

6. Variance expressed as a sum of components.129 

7. Gain due to stratification using proportionate sampling . . .130 

8. Variance of stratum means with random grouping of elements into 

strata.131 




CONTENTS 


CHAPTER PAGE 

9. Optimum allocation to strata.132 

10. Gain of optimum allocation over proportionate stratified sampling . 134 

11. Optimum allocation with variable costs between strata . . 135 

12. Sample estimates of population variances.137 

13. Variance for stratification after sampling.138 

14. Increase in variance from duplicating subset of elements . 139 

6 Simple One- or More Stage Cluster Sampling.142 

1. Variance and covariance for a two-stage and for a multi-stage 

sampling design.144 

2. Estimates of total variance and covariance for multi-stage design 

with simple random sample of first-stage units , 151 

3. Estimates of total variance and rel-variance for two-stage design . 153 

4. Estimates of components of rel-variance of ratio estimate for two- 

stage design.158 

5. Rel-variance of ratio estimate expressed in terms of 5, the measure 

of homogeneity; an estimate of 5 from the sample . .161 

6. Measure of homogeneity when primary units are equal in size 164 

7. Relationship between S for primary units and ultimate clusters 165 

8. Some physical properties of frequently occurring populations, and 

values of S under specified conditions. .168 

9. Relationships among measures of homogeneity.170 

10. Optimum values for simple two-stage design with simple cost func¬ 
tion . .172 

11. Optimum values for simple two-stage design with more general cost 

function. .173 

7 Stratified Single- or Multi-stage Cluster Sampling . . . 177 

1. Rel-variance of ratio estimate for two-stage stratified design . 177 

2. Estimate of the rel-variance for two-stage stratified design .180 

3. Estimates of components of rel-variance for two-stage stratified 

design. .181 

4. Rel-variance of ratio estimate for three- or more stage stratified 

sampling.182 

5. Gains due to stratification with cluster sampling .185 

6. Optimum values for two-stage stratified design with variable sam¬ 
pling fractions and simple cost function.187 

7. Optimum values for two-stage stratified design with variable sam¬ 
pling fractions and more complicated cost function . .188 

8. Optimum values for stratified design with joint use of one- and two- 
stage sampling, variable sampling fractions, and simple cost function 192 


8 Control of Variation in Size of Cluster in Estimating Totals, 

Averages, or Ratios .. . . 194 

1. Sample estimates and their variances for two-stage design when 

first-stage units are selected with varying probabilities . . 194 

2. Determination of optimum probabilities for a two-stage sampling 

design with self-weighting sample and simple cost function . .197 





CONTENTS 


xii 

CHAPTER page 

3. Optimum values for two-stage stratified design with self-weighting 

sample and simple cost function. 200 

4. Comparison of and .202 

5. Effect of variation in size of cluster in estimating totals . .203 

9 Multi-stage Sampling with Large Primary Sampling Units . 205 

1. Rel-variance of ratio estimate for a multi-stage stratified design 208 

2. Conditions for gain with probability proportionate to size 213 

3. When to equalize the sizes of strata.215 

4. Estimate of rel-variance of ratio estimate.216 

5. An estimate of rel-variance of ratio estimate when only one pri¬ 
mary unit is taken from each stratum. 218 

6 . Rel-variance for self-weighting sample in terms of measures of 

homogeneity. 222 

7. Optimum allocation for fixed total expenditure for a self-weighting, 

three-stage stratified design.223 

8 . Variance of ratio estimates by specific subclasses that can make use 

of both current and past information.225 

9. Reduction in variance due to stratification, when psu’s are large . 227 

10. Consistent estimates of components of rel-variance .... 228 

11. Optimum values for three-stage stratified design .... 232 

12. Adjustment for changes in probabilities when initial sample is se¬ 
lected with varying probabilities.234 

10 Estimating Variances .236 

1. Rel-variance of estimate of rel-variance and of coefficient of varia¬ 
tion with simple random sampling.236 

2. Rel-variance of estimated variance for stratified random sample . 237 

3. Optimum allocation to strata of a subsample for estimating variance 

of a stratified random sample.239 

4. Rel-variance of estimated variance based on random group totals . 240 

5. Variance of estimates of components of variance with cluster sam¬ 
pling .243 

6 . Use of known stratum means and totals in estimating variances . 246 

7. Confidence limits for median and other position measures . . 247 


11 Regression Estimates, Double Sampling, Sampling for Time Series, 

AND Other Sampling Methods .250 

1. Difference and regression estimates.250 

2. A consistent estimate of the regression coefficient, j3 .253 

3. Variance and optimum allocation for double sampling with regres¬ 
sion estimates.254 

4. Condition for cost and variance of single and double sampling de¬ 
signs to be equivalent.256 

5. Variance and optimum allocation for double sampling with stratifi¬ 
cation .257 

6 . Estimate and variance of Latin-square design.262 





CONTENTS xiii 

CHAPTER page 

7. Optimum allocation of sample and optimum weights for estimating 

a ratio from a stratified sample. 265 

8. Sampling on two occasions. 268 

9. Sampling for time series. 272 

A THEORY FOR RESPONSE ERRORS 

12 Response Errors in Surveys.280 

1. Role of nonsampling errors in determining survey design . . 280 

2. Some requirements on a mathematical model for response errors . 281 

3. Effect of interviewers on variance of sample estimates . . . 288 

4. Use of specified mathematical model in minimizing effect of both 

bias and variance. 298 

5. Effect of uncorrelated and compensating response errors . 305 

6. Applicability of specified mathematical model . . . 308 

7. Derivations and proofs.309 

Index. .327 




CHAPTER 1 


Introduction and Definitions 


1. The scope of the theory of sample surveys. The contents of this 
chapter. The theory of sample surveys is concerned with developing, 
analyzing, and improving methods of formulating the information wanted 
from a sample survey, selecting the sample, obtaining the information from 
the sample, translating that information into statements relating to the pur¬ 
poses of the survey, and evaluating the accuracy of those statements. 

As a preliminary to considering the parts of the theory of probability 
and mathematical expectation that are needed for the effective study of 
the statistical characteristics of sample surveys, we shall, in Sec. 2~5 of 
this chapter, introduce some of the more important definitions. In Sec. 6 
we shall discuss why probability methods of selecting samples should 
be used. 

2. Definitions of population, element, and list. By a finite population, 
is meant any well-defined set or class containing a finite number of 

elements, A^, A^, * ' •, Aj^r. These elements may be plants, farms, persons, 
blocks, counties, businesses, electric light bulbs, insects, and so on. The 
population will then consist of certain of these elements: the plants of a 
certain kind in a specified field, the farms of over a specified size, the 
unemployed persons in the United States, the blocks in a specified city, 
the counties in which coal is mined, the grocery stores in a specified state, 
the United States income tax returns for a stated year, the electric light 
bulbs produced in a given plant during a stated period of time, the insects 
in a given field. Thus, to define a population we must be able to state 
the kind of elements of which it consists and to give rules for including or 
excluding any particular element. These rules may take the form of an 
enumeration of the elements of the set or may be a statem.ent of the 
conditions the elements must satisfy. 

When the elements of a population have been numbered or otherwise 
identified, we call that population together with its identification system 
a list. (In some other studies of sample surveys the word frame is used 
as we use the word list) For example, if the population consists of the 
blocks of a specified city, we may obtain a map of that city on which the 
blocks and streets are outlined and assign numbers to the blocks. The 

1 



2 INTRODUCTION AND DEFINITIONS Ch. 1 

map and numbers constitute a list. If the blocks were numbered in a 
different order, we would have a different list. Similarly, if the population 
consists of the cards in a file, the list is determined by a particular arrange¬ 
ment of the cards. For two lists to be the same they must consist of the 
same elements with the same identifications; consequently, a single 
population may yield many lists. 

For some methods of selecting samples the order in which the popula¬ 
tion is arranged in a list will not affect the precision of the information 
obtained, whereas for others the order of arrangement may influence the 
precision considerably. 

We have introduced the term list because the recommended methods 
of selecting samples will all involve the use of some form of list. With 
such lists, it is possible to select a sample of elements from the population 
with known probabilities of selection, a prerequisite of the sampling theory 
to be considered. We will ordinarily select a sample of elements by 
selecting the numbers that identify them. Thus, if we were selecting 2 
elements out of a population of 10 elements, we might identify these 10 
elements by the numbers 1, 2, • • •, 10, select 2 of the 10 numbers, and 
say that the sample consisted of the elements identified by these numbers. 

Sometimes the list is such that any of several numbers in the list identify 
the same element of the population. Suppose that we want to select a 
sample of law firms and use as a list the names given in a register of 
lawyers. Then, if a law firm has 10 lawyers in the register, any of 10 
numbers will identify that firm. 

The numbering of the elements of the population need not be simply 
1, • • •, A, but may for various administrative and other reasons be more 
complicated. For example, a given household in a city may have 2 
numbers that jointly identify it—the number assigned to the block on 
which the household is located, and the number assigned to the household 
within the block. 

3. Definitions of characteristic, elementary unit, and population 
of analysis. By a characteristic of a population is meant any quantity 
or relationship relating to the population. 

Illustration 3.1. Illustrations of characteristics. 

Population Some characteristics 

All persons in a city. Average height; total income; per cent of 

income spent on food at different levels ot 
income; number of females; distribution 
of total income among families by size of 
income; attitude towards taxes; intentions 
to purchase refrigerators; relation of 
education and amount of crime. 


Sec. 3 


DEFINITIONS OF TERMS 


3 


Population Characteristics 

Electric light bulbs produced Number of defective bulbs; average duration 

on a given date in a given of burning of bulbs; variation in length 

and intensity of burning of bulbs; relation¬ 
ship of all these quantities to the order of 
production of the bulbs for each machine 
producing them. 

We mean to give to characteristic the broadest possible meaning. It may 
or may not be numerical. It is anything we may wish to learn about the 
population. 

Sometimes it will be convenient to speak of the characteristics of the 
elements of the population, i.e., the total income of all persons in an area, 
the height of the person, whether the bulb or the group of bulbs is defective, 
and so on. 

Often, particular interest will center on a particular class of elements 
for which frequency distributions or averages of the characteristic are 
desired. The particular elements for which such distributions or averages 
are desired will be referred to as the elementary units. A characteristic 
of an elementary unit may be an attribute, e.g., a person is male or female; 
or a value of a variable, e.g., the income of an individual or of a family! 
Since the same survey may yield different characteristics, it may also refer 
to several elementary units, e.g., the person and the family. Usually the 
objectives of the survey will determine both the elementary units and the 
population consisting of them. The population whose elements are the 
elementary units is called the population of analysis. 

Illustration 3,2. Suppose that the population of analysis consists of all 
families in the District of Columbia. Then we may say that X^, F,., Z,. 
are the respective characteristics—food expenditures, rent expenditures, 
and income in a specified period for the /th family, in the population 
which consists of all N families • • •, A^ in the District of Columbia. 

We will then say that is the value of the characteristic X for the zth 
element, A^, of the population. For example, Z,- is the value of the 
characteristic Z, total income, for the /th family in the District of 
Columbia. 

Exercises 

3.1. Suppose that the population of analysis consists of A electric light bulbs 
and the characteristic is the proportion of defective bulbs. Define a numerical 
characteristic for each of the bulbs such that, if you know the value of this 
characteristic for each element of the population, you can compute from these 
values and the size of the population the characteristic of the population. 

3.2. State a population of analysis and define two characteristics of the 
population. Also define characteristics of the elements of the population so that 
you can compute the characteristics of the population from those of the elements. 




4 INTRODUCTION AND DEFINITIONS Ch. 1 

3.3. Suppose that the population of analysis consists of 5 families, A^, A^, 
• • •, ^ 5 , and that the values of X, Y, and Z, designating, respectively, food 
expenditures, rent expenditures, and total income for these families, are as 
follows: 


Family 

X 

Y 

Z 

1 

10 

20 

no 

2 

15 

15 

200 

3 

13 

18 

80 

4 

18 

25 

90 

5 

24 

16 

70 


Compute the following characteristics: total expenditures for food and rent, 
total income, and per cents of income spent on food and rent. 

4, Definitions of terms such as sample, sampling plan, and sampling 
unit, A sample is a subset of a population selected to obtain information 
concerning the characteristics of the population, i.e., if the population 
consists of the N elements A^, A^, • • A^, then the sample will consist 

of some of these elements, e.g., A-^, A^, A^, • • *. We shall be concerned 
only with probability samples, i.e., all elements of the population have a 
known probability of being included in the sample. (Probability is dis¬ 
cussed in Sec. 4, Ch, 2.) We select a sample and obtain certain informa¬ 
tion for the elements of the sample, say the values of the characteristics 
X, Y, Z, and combine this information in such a way that we shall have 
useful information concerning certain characteristics of the population. 

Sometimes the population used for selecting the sample is the popula¬ 
tion of analysis, but often a different population is defined. For example, 
the elementary units may be people but the populations from which we 
select a sample are the populations of blocks and families. Thus, some 
populations are defined as a result of the problem we are investigating; 
others are defined to help us select the sample. 

For assistance in selecting a sample we may define several populations 
and methods of selecting samples. By the sampling plan we shall mean 
all the steps we take in selecting the sample once the population of 
analysis is defined. 

Illustration 4,1. To select a sample of the families of New York City 
we might first define a population consisting of all the blocks in New 
York City and select a sample of those blocks. Then we might define 
the families living on the selected blocks to be a population and select a 
sample of them. Thus, the final sample would consist of families, but 
we would have first selected a sample of blocks to help us select the 
sample of families. In picking the first sample we might use as a list a 
map on which the streets were shown and the blocks numbered. In 
picking the second sample, i.e., the sample of families, we might construct 


Sec. 4 


DEFINITIONS OF TERMS 5 

a list by sending people to the blocks selected in the first sample and 
having these people list the addresses of all families living on the sample 
blocks. Each line on the listing would identify the corresponding family. 
(There may be two or more lines for one address, if there is more than one 
family at a given address.) 


This method of obtaining a sample of families is an illustration of 
cluster sampling where the cluster is a city block and the family is an 
elementary unit. In general, by cluster sampling we mean that, for 
purposes of selecting a sample, we have defined a population whose 
elements are groups or clusters of elementary units, and that we will 
select a sample of the clusters from that population. Clusters may be 
natural, like the blocks of a city, or they may be constructed, as in 
Illustration 4,3 below. 

The elements of the population from which we select the sample are 
called sampling units. If, as in Illustration 4.1 of this section, the elements 
of the sample selected initially are clusters, and a subsample is taken 
from the selected sampling units, we often refer to the clusters as primary 
sampling units or first-stage sampling units, and to the elements of which 
the second sample consists as second-stage sampling units. The sample 
selected in Illustration 4.1 was selected in two stages, but we sometimes 
continue the process to three or more stages. The sample selected at the 
second stage is a subsample of that selected at the first, i.e., the sample 
obtained in the first selection is defined to be a population from which a 
sample is selected at the second stage, and so on. We refer to all such 
sampling plans as multi-stage sampling plans. 

Illustration 4.2. In order to pick a sample of people living in cities, 
we^ might begin by selecting a sample of counties (the primary sampling 
units are counties), then select a sample of cities within the counties 
selected in the first stage (the second-stage sampling units are cities), then 
select a sample of blocks within the selected cities (the third-stage sampling 
units are blocks), then select a sample of families within the selected blocks 
(the fourth-stage sampling units are families), and finally select a sample 
of people within the selected families. This is an example of a five-stage 
sampling plan. Counties, cities, blocks, and families are all clusters of 
elementary units and are also elements of populations from which samples 
are selected. Thus, the counties may be considered to be a cluster of 
cities, of blocks, of families, or of people, but also may be considered 
elements of a population which has counties as elements. The important 
thing is the flexibility with which the words element and cluster are used, 
i.e., cluster simply stands for a grouping of elements that is convenient 



Ch. 1 


6 INTRODUCTION AND DEFINITIONS 

for the selection of the sample, but once the clusters are defined they are 
themselves the elements of a population from which we select a sample. 

♦ 

In order to avoid confusion in using the terms population, element, and 
sample, it is sufficient to state exactly what the elements and population 
are for each stage of the selection of the sample. It may be remarked 
that, although the population of analysis and, sometimes, the elementary 
units that make it up are uniquely determined by the purposes of the 
survey, ordinarily there are many ways in which populations may be 
defined for selecting the sample to obtain information concerning any 
specified population of analysis. Whenever we select a sample, there will 
be many possible samples of which we select one to be the sample. We 
can say that these possible samples are the elements of a population from 
which one element is selected to be “the” sample. 

Illustration 4.3. Suppose that we are selecting a 10 per cent sample of 
the families in a city from a list by first selecting 1 of the first 10 families, 
A^, A 2 , • • *, Ao> 2 .nd every tenth family afterwards. Then, if A^ is 
selected, the sample consists of A^, A^i, Aqi, * * *; if ^2 is selected, the 
sample consists of A 2 , A^q, ^ 22 ? ’ ' ^ Thus, there are 10 

possible samples, of which we select 1 to be “the” sample. Each of these 
possible samples is a cluster but not a natural cluster. 

Exercises 

4 1 Suppose that a population consists of 5 families, Ai, A 2 , A 3 , A^, A^, 
where Ai contains i persons, / = 1, 2, • ' 5. List all possible samples con¬ 

sisting of 2 families and show the number of persons in each sample. 

4 2 A population consists of 10 families living on 2 blocks Bi and B 2 , 5 
families living on each block. The 5 families living on block B^ are denoted 
by All, * ’ '? ^i 5 > where the first subscript identifies the block, and the second 
identifies the family within the block, and the families consist of 3, 2, 1, 3, 1 
persons, respectively; those living on block B 2 are denoted by A 21 , A 229 » 

A 25 , and consist of 1, 6, 2, 3, 7 persons, respectively. The sampling plan is a 
two-stage plan; the list for the first-stage sample consists of Bi and B 2 , of 
which 1 is to be selected. If Bi is selected in stage 1, we select 3 elements from 
the list ^ 11 , • • 'f Ai^; if is selected in stage 1, we select 3 elements from the 
list A 21 , A 22 , • • % ^ 25 - List all possible samples, showing the number of 

persons in each sample. j i. ^ 

4.3. State a three-stage sampling plan for estimating a specified characteristic 

of a population you define. 

4.4. Is there only one population that is “correct” to estimate the values ot 
certain characteristics for the population ? If we wish to estimate total personal 
income, can we regard people, families, blocks, or file cards listing personal 
income as elementary units of the population? 

5 . Definitions of estimate, sample design, precision, true value, 
accuracy, survey design. After a sample has been selected we prepare 



Sec. 5 


DEFINITIONS OF TERMS 7 

estimates based on the sample for specified characteristics of the popula¬ 
tion of analysis. Often, as in estimating the average personal income in 
the United States, the estimate may be a number, e.g., the average income 
is $2350; but also the estimate may consist of an interval of numbers, 
e.g., the average income is more than $2200 and less than $2600; and 
often the estimates consist of several numbers, intervals, and functions. 
The sample design will consist of the sampling plan and method of 
estimation. 

Although, as we shall see later, there are many estimation equations, 
we shall for the present consider only one, the arithmetic mean. 

Suppose that the population consists of the five families listed in Ex. 3.3, 
and that when questioned they give the data listed in the table in that 
exercise. Suppose also that we wish to estimate 2, the average family 
income. If we knew the data of Ex. 3.3, we would calculate 
^ 110 + 200 + 80 + 90 + 70 550 

■“ -==^ — 110. But suppose that we 

do not know the data for the population, and in order to estimate the 
unknown value, 2, we decMe to select a sample of 2 families, learn the 
values of Z for these families, and use the average income of these 2 
families as an estimate of 2. There are 10 possible samples of 2 of the 
elements of the population. These possible samples, the corresponding 
values of Z, and the estimates of 2 are given in Table 1. Each of the 10 
possible samples will yield an estimate of 2. < 

Table 1. Samples of 2 from a population of 5 elements ' 


Possible samples 

Values of Z 

Estimates of z 

1 ] 


100, 210 

155 

Ai, Aq 

100, 

80 

90 

Ai^ A^ 

100, 

90 

95 

Ai, A^ 

100, 

70 

85 


210, 

80 

145 

■'^25 A/^ 

210, 

90 

155 

A2y A^ 

210, 

70 

145 

Agf A^ 

80, 

90 

85 


80, 

70 

75 

A^f A^ 

90, 

70 

80 


By a measure of precision of the estimate we shall mean a measure of 
how close the set of possible sample estimates for a particular sample 
design may be expected to come to 2. Such mjeasures of precision, to be 
useful, must be approximately known from other information, perhaps 
an earlier survey, or it must be possible to estimate them from the sample. 






8 INTRODUCTION AND DEFINITIONS Ch. 1 

Ordinarily, we would use an approximate value of the measure of precision 
in designing the sample; and then, if we had a reasonably large sample, 
estimate the measure of precision from the data obtained from the sample 
itself. 

It is not enough, however, to consider measures of precision. Was the 
information that would have been obtained and listed in Ex. 3.3 correct? 
Did some famihes understate their incomes? Did the interviewer forget 
to ask some families about income from pensions and similar sources? 
Did the interviewer make a mistake and substitute family for family 
Ai, which was supposed to be in the sample? By the true value of the 
characteristic we shall mean the value that would be obtained if no errors 
were made in any way in obtaining the information or computing the 
characteristic. By a measure of accuracy of an estimate we mean a 
measure of how close the estimate may be expected to come to the true 
value of the characteristic. Thus, even a complete enumeration may not 
be entirely accurate, but, according to our definition of precision, a 
complete enumeration is precise. 

By the survey design will be meant the sample design together with 
the questionnaire and the method of obtaining the information from 
the sample, or, more generally, the method of measurement. Thus, the 
survey design includes the plans for all the parts of the survey except the 
statement of the objectives. It includes: 

{a) The questionnaire, 

{b) Decision on method of observation or interview, 

(c) Sample design, 

{d) Choice and training of interviewers, 

(e) Assignments of interviewers, 

(/) Decisions on treatment of noninterviews, 

(g) Estimation equations, 

(h) Processing of questionnaires, 

(/) Preparation of tables, 

{j ) Studies of precision and accuracy of information, 

as well as instructions and methods followed for carrying through these 
operations. It will be seen that each of these parts affects the accuracy 
of the information to be obtained. Since the objective of survey design 
is to maximize the accuracy of the estimate (or, more generally, to minimize, 
in some sense, the losses that may result from the fact that the estimate 
will almost certainly not equal the true value), it follows that the expendi¬ 
tures for the different parts of the survey should be allocated with this 
objective in mind. 



Sec. 6 REASONS FOR PROBABILITY ME! HODS 9 

6. Why probability methods should be used in selecting samples. There 
are many possible methods of selecting a sample from a population. 
Some of these depend on the judgment of people who claim to know the 
population; others merely consist in defining the sample to be the part 
of the population that is most conveniently available; others (and it is 
these that we shall study in the following pages) are based on the use of 
the theory of probability. In applying these probability methods, the 
following two points must be kept in mind as the justification of their use 
and as the condition of their valid application: 

{a) Methods of selecting samples based on the theory of probability are 
the only general methods known to us which can provide a measure 
of precision. Only by using probability methods can objective 
numerical statements be made concerning the precision of the results 
of the survey. 

{b) It is necessary to be sure that the conditions imposed by the use of 
probability methods are satisfied. It is not enough to hope or 
expect that they are. Steps must be taken to meet these conditions 
by selecting methods that are tested and are demonstrated to 
conform to the probability model. 

It should be obvious that we are not presenting methods based on 
probability theory as just one more means of selecting samples. Rather 
we assert that, with rare exceptions, the precision of estimates not based 
on known probabilities of selecting the samples cannot be predicted before 
the survey is made, nor can the probabilities or precision be estimated 
after the sample is obtained. If we know nothing of the precision, then 
we do not know whether to have much faith in our estimates, even though 
highly accurate measurements are made on the units in the sample. 
Hence, when the information to be obtained is of real importance, it will 
be desirable to choose methods based on the use of the theory of 
probability. 

It is sometimes argued that any sample is selected by probability 
methods; that the interviewers who select a sample by approaching 
people they meet in certain localities are using probability methods when 
they do their jobs as they should, not selecting more than designated 
numbers from one specified group or another. Such probabilities are, 
however, unknown; they may vary from enumerator to enumerator, and 
they may vary over time. It is not adequate to test to see whether the 
probabilities seem to be what they should, because the conditions existing 
at the next apphcation of the procedure will differ from what they were 
during the test. Considerable evidence exists that only carefully tested 
methods of selection which are capable of being repeated can be depended 



10 INTRODUCTION AND DEFINITIONS Ch. 1 

upon to yield either equal probabilities or any other specified probabilities 
of selection. 

We ask for greater care in selecting samples than is customary in many 
other apphcations of statistics. Yet in these other branches of statistics 
we would ask for such care if it were physically, administratively, and 
economically possible as it usually is in survey design. One of the most 
important facts about the selection of samples is that, if a list can be 
constructed and if proper methods of selection are used, we need not 
guess, we can determine the probabilities of selecting the possible samples. 
Furthermore, the increases in precision and the ability to measure precision 
that thus become available will in general more than repay the cost of 
applying these methods. 

The uses to which some survey results will be put are sufficiently crude 
so that almost any method for selecting the sample will yield satisfactory 
information. Obviously, the cheapest method that meets the purposes 
of the survey should be chosen. In some instances probability sampling 
methods may not be feasible. Probability statements concerning the 
precision of surveys should, however, not be made unless probabihty 
methods have been employed. It may be very misleading to apply 
probability statements to nonprobability surveys. In fact, the need for 
such statements might be taken as the test of whether, if feasible, the 
sample should be selected on a probability basis. 

Exercises 

6.1. Assume that a sample of families is selected by means of probability 
methods of selection; questionnaires are sent to them by mail, and the informa¬ 
tion is then obtained only for those families that return the questionnaire. Is 
this a probability method of sampling in the sense that there are known proba¬ 
bilities for specified families being in the sample? Can a population be defined 
from which it is a probability sample ? 

6.2. Sometimes a sample of people is selected by first estimating the propor¬ 
tions of the population that are in certain classes, e.g., 52 per cent female, 48 
per cent male; then distributing the sample in those same proportions; and 
finally asking the interviewers to find specified numbers of people in each of these 
categories, these specified numbers being so selected that the total sample is 
selected in the correct proportions. What are the problems with this kind of 
sampling ? Under what conditions would it be good or bad ? Is it a probability 
method of sampling? 


CHAPTER 2 


Fundamental Theory—Operations, 
Events, and Probability 


1. Introduction. In this chapter, we shall give the minimum introduc¬ 
tion to the theory of probability that permits us to de\ elop the theorems 
that we need for the theory of expected values. Of these, Theorems 3 
and 5 of Sec, 6 (p. 28) on the probabilities of ‘‘product’' and “sum” 
events, respectively, will be found exceedingly useful. 

The notions of probability and the operations of selecting an element 
from a population, discussed in Sec. 3 and 4, must be clearly understood 
and related to sampling. If not, much of the remainder of this chapter 
will appear to be a formal treatment unrelated to reality. 

As a preliminary we introduce summation notation in Sec. 2. This 
notation is used throughout the volume. 

2. Summation notation—why we study summation notation. A know¬ 
ledge of summation notation is very useful in statistics in the following 
ways: 

{a) It provides a convenient shorthand for expressions that would 
otherwise be very cumbersome. 

(b) By proving theorems concerning the notation itself we obtain 
results that would otherwise need to be proved in many special cases. 

The integers. We will refer to the positive integers 1, 2, • • •, the 
negative integers ~ 1, — 2, • • *, and zero as integers. 

Summation notation. The symbol S is the Greek capital letter “sigma” 

M 

and IS the notation used to indicate summation. The expression 2/(0 
stands for/(l) +/(2) + • • • -{-/(M), and the expression 

M Ni 

.2 .2/0'>y) =/(i. 1) + • • • +/(i, Vi) + /(2,1) + • ■ • 

+/(2, Vs) + • • • +f(M, 1) + • . ■ + /(Af, Nm) 
whatever may be the functions/(/) and/(/,/). 

11 


j. 



12 PROBABILITY Ch. 2 

Whenever the summation is from 1 to some indicated value, the “= 1” 

N 

is omitted below the summation sign. Thus 2 has the same meaning as 
2- If the lower limit of summation is different from 1, it will be indicated. 

'i=l 

Illustration 2.1. The heights in inches of 5 people are measured and 
turn out to be 67, 72, 63, 68, 70. They are denoted by x^, and 

x^. The mean or average height is calculated by 

67 + 72 + 63 + 68 + 70 340 

-—--- = = DO 

5 5 

which may be more concisely written 


In general, the arithmetic mean x is defined by the equation 

„ 1 ^ 

if there are N observations. It is important to realize that x is not a 
function of /. It would have the same value if x were written: 

- 1 ^ 


2.1. Show that 


Exercises 

N N N 

= IPi^i = 7.PA 

i 3 k 


2.2. Evaluate 2(^i “ where x — 2^r 

i i 

N N 

2.3. If a, i ^ 1, 2, • • *, N, evaluate 2^*» he., 2«- 

i i 

N 

2.4. Write out 

i 

5 

2.5. If/(0 = a + bXi + cyt, evaluate 2/(0 » of x and y. 


2.6. If Zi = xpi + yqi, where jpi = evaluate %Zi. 

i i i 

2.7. If f{i) = (x,i^ - xf, where i = 1, 2, 3, and x^ = 0, .^2 = 4, x^ = 2, then 

‘3 

evaluate 2/(0* 



Sec. 2 


SUMMATION NOTATION 


13 


2.8. Show that 

^-1 N 

2 = 2»i- % 

i i 

Illustration 2.2. Summation with respect to two subscripts. Let us 

M N 

evaluate the product 2^, This is, by definition, equal to 

i i 

(^1 + • • • + 4- • • • + Vn) 

or 


^i(^i 4 • * • + ^iv) + 4 • • • 4 4 • • *4 4 • * • 4 y^) 

or 

4 x^y^ + . . . + x^yr,^ 

4 x^y^ 4 X2y2 -f- • * • -j- x^yj^ 



CARNEGIE INSTITUTE 
OF TECHNOLOGY LIBRARY 





14 


PROBABILITY 


Ch. 2 


M 

where 2 means that the summation extends over all possible values of 

i<j 

i and j for which i < j, so that 




For example, if M — 3, we have 


2 — ^1^2 “b ^1^3 "b ^2^1 ~b ^^2^3 b X^Xi -j- ^^3^2 


d 

2 XiX^ == XiX2 + X^Xs + a?2^3 


Exercises 

2.9. Prove Eq. 2.2 and 2.3 of this section. 

2.10. Prove that 

M N N M M N 

i j j i ^ 3 

N M J i.- 

where = 2^zj and = 2^w> by writing out the summations and matching 

j 

terms. (This is called inverting the order of summation.) 

2.11. Prove that 

M i M M 

2 2 % = 2 

i j j i=3 

by writing out both summations and matching terms. 

2.12. Show that in general 

° M MM 

i i i 

although equality may occur for specified values of the cr’s and y"s, 

2.13. Show that 

M MM 

l^iXi + y^) = 2 ^^• + 

i i i 

2.14. Show that in general ^ 

1 2 X 

J_ _ / j. V 

2 2 ■ V- 

Iv. 

i 

although equality may occur for specified values of the x s and y s. (This 
result is also true when 2 is replaced by M.) 

2.15. Show that 

M M M M 

y^aXi + byd'^ = a^t^l + 2ab%xSi + b^^V^ 



Sec. 3 


THE NOTION OF PROBABILITY 


15 


2.16. Show that 

M 

2(^?0 + + « 2 ^ 2 ^ + • • • + aK^Ki) 

MM M 

= Mgq + + • * * + ax^Xjci 

i i i 

— Mqq + aiXi. 4- • • • -f. uk^k. 

where 

M M 

^1- = 

i i 

3. The notion of probability. Experience suggests that many an 
operation is such that, when the operation is carried out under suitably 
controlled conditions^ it is impossible to predict exactly which of the several 
possible results of the operation will occur on a particular performance, 
even if one has complete information about the outcomes of preceding 
performances. Nevertheless, experience also indicates that we may 
expect a high degree of stability in the proportion of times a particular result 
will occur in a sufficiently long series of performances of the operation. 
Illustrations may be found in such operations as tossing a coin or in the 
production of goods after the process of production is in a state of control. 

Our ability to use data obtained from a sample in order to make 
inferences about a larger universe from which the sample has been drawn 
depends upon our ability to select the sample by means of an operation 
having the above-mentioned properties. Experience has indicated that 
such operations may be based on the tables of random numbers, and it 
is for that reason that these tables are important in sampling practice. 

In order to develop the properties of such an operation quantitatively 
to the extent necessary for their application to sampling work, it is con¬ 
venient to associate numerical values with the possible outcomes of the 
operation, in the following manner; If the operation has K possible 
outcomes ' • *, A^: and njn is the proportion of times that the outcome 
Ai is observed in a series of n trials, and if we can expect the proportion 
nfn to be arbitrarily close to a number independent of n provided n 
is sufficiently large, we shall say that the probability*^ of A^ is P^. The 
numbers P^ are clearly non-negative and their sum is 1, since 

+ * * * + n^: ~ n. 

We shall often refer to a possible result as an elementary event. Thus, 
if ^ 1 , ' • Ax: the possible results of an operation, then they also 

are K elementary events, one and only one of which will occur when the 
operation is performed. If A^ occurs when the operation is performed, 

* For a more rigorous development of the theory of probability, the reader 
should consult W. Feller (2). 




16 


PROBABILITY 


Ch. 2 


we say that the elementary event has occurred. We shall denote the 
set of all possible results or elementary events associated with an operation 
by . 

By an event, we shall mean a subset of the set, s/, of elementary 

events. Thus, may consist of, say, A 2 , A^, and ^ 7 . We shall say 

that the event A* occurs if and only if one of its constituent elementary 
events occurs. In the above illustration, A^ would occur if and only if 
A 2 or y 44 or ^7 occurred. The event A* may be s/ itself, or a specific 
A^y or any subset of The complen'ientcivy event to A* is designated 
by yf* and consists of all the elementary events of that are not in .d* 
and none of the elementary events of that are in A^. 

Abstraction from experience suggests the following definition of the 
probability of an event, since in n performances of the operation the 
relative frequency of occurrence of will be the sum of the relative 
frequencies of occurrence of the elements that constitute A'^. 

Definition. If yl* is an event consisting of the elementary events A^^, 

• • *, A^^, then we define the probability of to be 

Pr{A'^) ^ Pi^ + Pi^ A- ’ • ' Ar Pi, 

where q, • * % 4 are k of the K integers 1, • • K. 

Thus, if the set, consists of K possible results all of which have equal 
probability, i.e., P,. === 1 /i^, / - 1 , • • •, K, and if the set consists of k 
of these elementary events (possible results), then, by the definition of 
probability 

Pr{A^) = ^ 

Since A'^ and yf* include all elements of ^ and since Pr{s^) = 1, it 
follows that 

Pr(^*) + Pr{A^) = 1 

4. Probability selection methods and the equal probability selection 

method. The operation that is basic in sampling is the selection of one 
element from a population in such a way that each element has a known 
probability of being selected. To do this, it must be possible to assign 
the appropriate probabilities to all possible results of the operation. If 
the elements of a population are A^, A 2 , ' * s A^, then the selection of 
one element from that population has one of the possible results: A^ is 
selected, or A 2 is selected, or, • • •, or Aj^ is selected. If each of the 
possible results of the selection has a known probability, we say we have 
a probability selection method. More specifically, by a probability 
selection method we shall mean an operation applied to the elements of a 


Sec. 4 


SELECTION METHODS 


17 


population such that when the operation is performed one and only one 
element of the population is selected; the probability that the elerhent 
is selected from the population consisting of A^, ^ ‘ A^ is P,-, / = 1 , 
‘ * Pi > 0 , Pj + * • • + P^ = 1 , where the P, are known numbers. 
A selection method is called an equal probability selection method or 
epsem if = = p^ ^ ijK, i.e., if all elements are equally likely to 

be selected. The sample itself may be selected by first selecting one 
element, then another, and so on, or by defining an auxiliary population 
consisting of all the possible samples and selecting one of the elements of 
this population to be the sample. A widely accepted practice of assigning 
probabilities to the results of a selection method is the use of a table of 
random numbers. (See Sec. 8 for detailed description of the properties 
of tables of random numbers and illustrations of their use. Chapter 4 
of Vol. I gives some further illustrations.) 

niustration 4.1. Assume that the elements of the population ^ are 
Ai, A 2 , A 3 , and that we wish to select two elements from the population 
with equal probability. We can select the first element with equal 
probability, and then select the second element with equal probability 
from the remaining two. In this selection method, the possible results of 
the first selection are 

Ai, A2, A3 

The possible results of the second selection are 

A 2 or A 3 —if Ai is selected on the first selection, 

Ai or ^ 3 —if A 2 is selected on the first selection, and 

A^ or A 2 —if A 3 is selected on the first selection. 

Hence, the possible results of first selecting an element by an epsem, and 
then making the second selection by an epsem from among the remaining 
two elements in the population, are 

Bi ” Ai, A2 — A2, A3 

^2 ~ A^, A 3 B 3 = A 3 , Ai 

B 3 =- A 2 , Ai Bq = A 3 , A 2 

Instead of selecting the first element from the original population and 
then selecting the second element from the remaining elements of the 
population, we can also select two elements by defining another population, 
whose elements are the pairs listed above. We can then select one of 
the elements B^, ^ 3 * Bq. If, say, ^3 is the element selected 

from then the sample consists of A 2 and A^. Thus, the elements of 
are the possible samples of elements of Note that, since each pair is 











18 PROBABILITY Ch. 2 

selected with equal probability, the probability of selecting one of the 
elements of ^ is 

The selection method illustrated here is a special case of simple random 
sampling without replacement, which is defined later in this section. 

Illustration 4.2. Two coins are tossed. The elements of ^ are A 2 , 
A 3 , and ^ 4 , where Ai is HH, A 2 is HT, A 3 is TH, and ^4 is TT, and 
HH stands for two heads; HT stands for the first coin a head, the second 
coin a tail; TH stands for the first coin a tail and the second coin a head; 
and finally TT stands for two tails. In this illustration, the operation 
consists of tossing two coins; the results of the operation are given by 
Ai, A 2 , A 3 , and ^4 above. Finally, the probability of each of the results 
for “true” coins is J. 

Illustration 4.3. Two coins are tossed. The results of this operation, 
Ai, A 2 , A 3 , and A^, are as defined in Illustration 4.2. Also assume that 
the coins are “true” so that the probability that one of the results A^, Aq, 
Aj, or A^ occurs is equal to Let us now find the probability of 
obtaining exactly one head. Thus, in this illustration consists of 
HT and TH. By the definition of the probability of an event 

Pr(A^) = PHT+^TH“i"b 4~2 

Similarly, for finding the probability of at least one head, the event A* 
consists of HT, TH, and HH. Hence Fr(A*) — |. 

Illustration 4.4. Three blocks B^, B 2 , and B 3 contain 3, 8 , and 1 houses. 
To select one block by the method called selection with probability pro¬ 
portionate to size we proceed as follows: select one of the integers 
1, 2, • * *, 12 by an epsem; if the selected integer is 1, 2, or 3, then B^ 
is selected; if the selected integer is 4, 5, • • % 11, then B 2 is selected; 
and if the selected integer is 12, then B 3 is selected. What are the 
probabilities of selection for each block? 

Since the selection of integers is by an epsem, it follows that Pr{i) = 

/ = 1, • * *, 12. Hence, by the deWtion of probability, 

JPr(selecting B^ — 

Pr(selecting B^ — x\ 

^/•(selecting P 3 ) = xV 

For further theory and apphcation of probability proportionate to size 
see Ch. 8 (Vol. I and II) and Ch. 9 (Vol. I and II). 

a. Definition of sampling without replacement. If a population consists 
of A elements, and if a sample of n elements is obtained by first selecting 
one of the N elements, and, without replacing it, selecting one of the 
remaining N — 1 elements, and, without replacing the two selected 


Sec. 4 SELECTION METHODS 19 

elements, selecting one of the remaining N — 2 elements, and so on, so 
that at the nth selection, there are N ~ n \ elements, then we say that 
the sample ha§ been selected without replacement. Since there are N 
possible results of the first selection, A^— 1 possible results of the second 
selection, • • % and A^-— « + 1 possible results of the nth selection, it 
follows that there are N{N-~ 1) • • • {N ~ « + 1) possible results of the 
n selections. (This evaluation of the number of possible results follows 
from Theorem A.2, p. 37, of the Appendix to this chapter.) 

As a special case, if we let N = n, wq have «! = «(«— 1)(« — 2) • • -1 
as the number of possible orders in which a specified set of n elements 
may be selected without replacement. Each possible selection is then 
simply an arrangement of the n elements in the order of selection. 

b. Definition of simple random sampling without replacement. If the 
method of selecting a sample of n elements from N elements is such that 
each of the possible ^-combinations of elements is equally likely to 
be selected, then the sampling plan is called a simple random sanlpling 
plan without replacement or, simply, simple random sampling. It will 
be assumed that the term simple random sampling applies to sampling 
without replacement unless otherwise qualified. The symbol Cf is the 
number of combinations of N things taken « at a time, and is referred to 
as the number of ^-combinations. (Combinations and permutations are 
discussed in the Appendix to this chapter, which should now be read by 
the student who is not already acquainted with the subject.) 

Thus, a simple random sampling plan is such that each of the Cf 
combinations has probability IJCf of being the sample actually selected. 
Also, if is an event occurring if any of a specified AT^* combinations 
(where is the number of elements of .4*) is selected, then by the 
definition of probability (Sec. 3 of this chapter) 




For example, if the elements of the population are A^ 


X ' - - --- -- J 

and if A* is the event “the sample of size n contains /f,,” then there are 
of the possible samples of n elements containing the element A., 
Na* = and 

PriA*) 


rN-~i 

C^n 


n 

N 


If A* is the event “the sample of size n contains A^, A^, ■ 
any other specified m elements),” then 


(or 


Pr{A*) = 


_ Kn-\) 
CS N(N- 1) 


(n — m + 1) 
(,N-m+ 1) 




20 


PROBABILITY 


Ch. 2 


Theorem 1. In sampling without replacement, if each of the n-permuta~ 
tions of N elements has equal probability of being selected to be the sample, 
then each n-combination of N elements has probability 1/Cf of being 


selected. 

Proof There are NlfN — «)! possible ^-permutations of N elements 
so that if they have equal probability then each has probability 
(N— ny.jNl. Each of the ^-permutations is an element of j/. If is 
the event occurring if a specified ^-combination is selected, then there are 
n\ elements of ^ in v4*. Hence, by the definition of the probability of 
an event (Sec. 3), 


Pr{A^) == n\ 


(N-n)\ 

N\ 


J 

C 


N 

n 


Thus, one way to select a simple random sample of n elements is to 
give each ^-permutation an equal chance of being selected. 

The probability is 1/A^ that, in a sample selected by simple random 
sampling, the element is the ;th element selected, since there are 
Nll(N—n)\ ^-permutations and the number of possible selections of j 
elements such that is not selected before theyth selection and is selected 
at the yth selection is, by Theorem A.2 (p. 37) of the Appendix, 
(A— 1) • • ‘ (A — « + 1), no matter what the value of j. Then 


Pr{Ai^ is the Jth selected element) 

__ 1 )- • -(A-^^- 1) _ 1 

" A(A- !)• • •(A-«+ l)~ N 

Note that this is equivalent to saying that if n selections are made without 
replacement, then the probability of A^ being selected at any one of the 
n selections is equal to 1/A. (This probability is also evaluated using 
conditional probability in Sec. la) 

Similarly, 

Pr{A^, • • •, Aj!^ are the first M selected elements) 

(A-M)(A-M- !)• • '{N-nA- 1)_ 

^ A(A-~1F^• (A- M + 1 )(A-M)(A~M- !)• • • (A-« + 1) 
_ 1 
“ A(A - !)• • •(A-M+ 1) 

(This probability is also evaluated using conditional probability in Sec. la.) 

c. Definition of sampling with replacement. Assume that the population 
consists of A elements. If a sample of n elements is obtained by first 
selecting one of the A elements, replacing it, then making a second selection 
and replacing the element before making a third selection, etc., until n 



Sec. 4 SELECTION METHODS 21 

selections are made, then we say that the sample is selected with replace¬ 
ment. Since there are N possible results in each of the selections, the 
number of possible results of two selections is N% of three selections, 
and of n selections, N^. Note that there is no restriction to the number 
of times a particular element may be included in the sample. 

d. Definition of simple random sampling with replacement. If the 
sampling is done with replacement, and each element has probability IjN 
of being selected at each selection, then we call the system of selection 
simple random sampling with replacement. Now let us find Pr(A^\ 
where is the event “the sample of size n contains the element at 
least once”; we have 

~ number of samples 

1 )« === number of samples that do not contain A^ 

1 )« ™ number of samples that contain Ai at least once 
1 )'^ 


{N~ 


Illustration 4.5. Let a population consist of L sets of elements such 
that each element of the population is in one and only one of these sets. 
Let the ith of these L sets consist of elements, / = 1, • • •, L, where 
N Ni N^. If a simple random sample of n elements is 

selected, let us find Pr{A^\ where A'^ is the event “the n elements are so 
distributed that n-^ will fall in class class 2, • * •, in class LP 

The number of ways in which n^ elements can be selected from is 
Hence the number of samples in which there will be exactly n^ in 
class 1, in class 2, • • *, n^ in class L will be Since 

there are Cf possible samples of n elements, it follows'that 


Pr{A^) — ~ 


rN 


Thus, if a simple random sample of 10 elements is selected from a 
population consisting of 20 elements of one kind and 80 elements of 
another, then the probability that the sample contains 4 elements of the 
fir-^ ^-'t*nd and 6 of the second is 

CfCf 

r^ioo 

'^10 


e. Definition of systematic sampling. Let us suppose that a population 
consists of the elements A^, A^, * * % ^iv arranged in some fixed order 
and that the possible samples from the population are defined to be the 
subsets % A% of the population, where the elements of A'^ 



22 


PROBABILITY 


Ch. 2 


are ^ 2 jsr+t? ' ’ % ^(n-DJsr+t? ^ (Note that some of 

the ^(n-i)i?+^ will not be present if nK < N,) Then, if one of the possible 
samples A'^, * • *, is selected, either by an epsem or with probability 
proportionate to the number of elements it contains or by any other 
means, we say that a systematic sampling plan is being used. 

We shall now suppose for simplicity that N — Kn. Then, since each 
Aj occurs in one and only one of Af, • • *, it follows that 

Fr(Aj is in the sample) = Pr(A^ containing Aj is the sample) 

1 _ ^ 

just as in simple random sampling. 

Also, note that, whereas there are Cf possible simple random samples 
of n elements from a population of N elements, there are only K = N/n 
possible systematic samples of n elements from a population of A elements. 

Exercises 

4.1. It is desired to select a sample of law firms. For selecting this sample an 
up-to-date register of lawyers is available. There are N lawyers listed in this 
register. To obtain a law firm, it is planned that 1 lawyer be selected by an 
epsem, and the firm to which this lawyer belongs will be in the sample. 

Does this procedure select law firms by an epsem? What is the probability 
of selecting a specified firm containing 1 lawyer, 3 lawyers, k lawyers ? 

4.2. A city contains 1000 blocks of which 10 are vacant. A block is selected 
by an epsem. What is the probability of selecting a vacant block ? 

4.3. In a town of 1000 families, 100 consist of 1 person, 300 of 2 persons, 
500 of 3 persons, and 100 of 4 persons. One family is selected by an epsem. 
What is the probability that it consists of 2 persons ? Two families are selected 
so that all pairs of families are equally probable. What is the probability that 
both consist of 3 persons ? 

Ten families are selected by simple random sampling. What is the prob¬ 
ability that 3 of them consist of 1 person, 4 of 2 persons, and 3 of 4 persons? 

4.4. If simple random sampling with replacement is used, then what is the 
probability that neither A-^ nor A^ is obtained in a sample of size n from A-^, 

• • *, ^jv? 

5. Product events. Independence. Conditional probability. Let ^ be 

the set of possible results A^, • • A^ of an operation and let these 

K 

possible results have probabilities P 2 , * * *, Pr, Pt > 0, 

Let and B* be two events; i.e.. A* consists of certain of the elements 
of and consists of certain of the elements of^. 

Let A*B* be the product event, i.e., the event that occurs if both A'*^ 
and B* occur when the operation is performed. In other words, the 



Sec. 5 INDEPENDENCE AND CONDITIONAL PROBABILITY 23 

event consists of all elementary events common to and R* and 
occurs if and only if one of these common elementary events occurs. 
Consequently, if A^^, • • *, Ai^ (where k ^ K) are the only elementary 
events common to and B*, we have, by the definition of proSabilitv 
(Sec. 3), ^ 

Pr{A^B^) = P,^ + ^ . ( 5 . 1 ) 

Illustration 5.1. Population ^ consists of 4 elements, A^, A^, A^, A^. 
A simple random sample of 2 elements is selected from this population. 
The possible samples of 2 elements are 

^ 1 > ^ 3 ? ^ 4 ^ A 2 , Aq‘, A 2 , A^l A^, A^ 

Let A'^ consist of the possible samples containing A^, and let B^ consist 
of the possible samples containing A^. Then, Pr(A*) = Pr{B^) = 
and Pr{A^B'^) is the probability of possible samples containing both 
Ai and A^ and is equal to 

Illustration 5.2. Two coins are tossed. The event A'^ is defined to be 
“at least 1 head occurs.” The event is defined to be “at least 1 tail 
occurs.” Then, consists of the 4 elements HH, HT, TH, and TT; 

consists of HH, HT, and TH; and R* consists of HT, TH, and TT. 
Hence, Pr{A^) — Pr{B'^) = |. The product event A^B* then consists of 
HT and TH, so that 

Pr(A'^B^) = Pht + Pth — i 


If % A%r are M events, then the product event 

consists of the elements common to all M events. 

Definition of independent events. Two events and P* are called 
independent if and only if 

Pr{A^B^)-=^Pr{A^)Pr{B^) 

Illustration 5.3. Assume that the population consists of the 4 elements 
and that a simple random sample of 2 is selected from this 
population with replacement. The following are the possible samples of 
2 to be selected from this population where the element on the left of each 
pair listed below represents the element drawn on the first selection and 
the element on the right represents the element drawn on the second 
selection; 


A^A^ 

A^A^ 

A^Ai 

A^Ai 


A2A2 

A3A2 

^4^2 

AiA^ 

A^A^ 

A^A^ 

A^Aq 

AiA^ 

A^A/^ 

AqA^ 

A^A^ 







PROBABILITY 


Ch. 2 


24 


Let A* be the event that be drawn on the first selection of a sample 
of 2. Let be the event that As be drawn on the second selection of a 
sample of 2. Fr(A*) - Fr(B*) = h since, in 4 of the 16 pairs listed 
above, Ai is the first selection, and similarly in 4 of the 16 pairs As is the 
second selection. Now A*B* is the event that Ai be drawn on the first 
selection and As on the second selection. Hence, Fr(A'*^B*) = A- By 
the definition of independence A'*' and B^ are independent events, since 
Fr(A^B^) = Fr{A^)Fr{B^) = *. 

Illustration 5^4. Let a dime and a quarter be tossed and let the result 
of tossing the dime be listed first. Let the tossing be such that all 4 
possible results have equal probability J. Define to be the event 
“heads occurs when the dime is tossed” and B'^ the event “heads occurs 
when the quarter is tossed.” Then 


Fr{A^^ — -^HT — ^ 


Fr(B^) - Phh + Pth = i 

and 

Fr(A^B*) = Phh = i 

Hence, A* and P* are independent, since Fr{A^B*) = Fr{A^) Fr{B^). 

If the events A* and B^ were defined as in Illustration 5.2, then we 
would have 

Fr(A.^) = Fr(B*) = i 

and 

FriA^B'^) - I 


so that and B* would not be independent. 

Exercise 5.1. Show that A-^ and B* in Illustration 5.1 are not independent. 


Definition of conditional probability. Let A*" and P* be two events. 
If Fr{B'^) > 0, then the conditional probability of the occurrence A'^ 
subject to the occurrence of B^ is defined to be 


Fr{A^\B^) 


Fr{A^B^) 

FriB-^) 


Conditional operation. We shall now consider the following operation, 
which we call a conditional operation. Let B^ be an event, 0 < Fr{B'*') < 1, 
and suppose that the only change in the original operation is that all 
results possible originally which are not contained in become im¬ 
possible. Physically, this could be accomplished by skipping any per¬ 
formance in which B^ does not occur. Thus, the possible results of the 
conditional operation are the elements of B^. We shall denote their 


Sec. 5 INDEPENDENCE AND CONDITIONAL PROBABILITY 25 


probabilities by Pr{A^\B^). Then by our previous discussion of the notion 
of an operation we have > 0 and 

2 Pr{A,\B^) 1 

where by 2 we mean the summation over all elements A, that are in B^. 

Note that A^eB^ is shorthand for ''A^ is an element of B'^P Also from 
the definition of conditional probability it follows that 


Pr{A,\B^) 


Pi 

Pr{B^) 
= 0 


if A^ is an element of B^ 
if Ai is not an element of P* 


Illustration 5.5. Assume that the population consists of 5 elements, 
^ 1 , A^y Aq, A^, and A^. Assume that a simple random sample of 2 
elements is selected without replacement from this population. We shall 
denote the result of the first selection by % and of the second by ^ 2 ' 
Then the possible samples are given in the following table, where a dash 


\«2 

«l\ 


A, 

A 3 

A 4 

^5 


— 

V 

V 

V 

V 

^2 

V 

— 

V 

V 

V 

^3 

V 

V 

— 

V 

v 

A, 

V 

V 

V 

— 

V 

^6 : 

V 

V 

V 

V 



(—) indicates that the sample is impossible and a check (V) indicates that 
the sample consists of the elements shown in row and column headings. 
It is clear from the above table that the possible selections for depend 
on which element is selected for In fact 

Pria^ = A^\a^ = 0 

Pr(a^ = Aj\a^ = A^) - J if / j 

where Pr(a 2 ™ Aj\ai AJ is read “the conditional probability that 
Qz == ^3 given that (or subject to the condition that) = A^P Note 
that ai and a 2 are random events (discussed more fully in Chapter 3). 

Illustration 5.6. Suppose that a population consists of 2 blocks and 
B 2 on the first of which are located 3 houses A^^, while on the 

second are located 2 houses .421 and . 432 . We first select 1 block by an 
epsem, and then from the selected block select 1 house by an epsem. 



PROBABILITY 


26 


Ch. 2 


Denote the result of selecting a block by ^ and the result of selecting a 
house by a. Then 

Pr{a = ~ Bj) = 0 

= 1 if/=y=i 
= i iti= j = 2 

where we read Pr(a — the “conditional probability that 

the selected house is the second on block i subject to the condition that 
the selected block is the yth block.” Here a and b are random events 
(see Chapter 3 for fuller discussion). 

Illustration 5.7. Suppose that, of the N persons in a population, 
^ have incomes over $3000, and have incomes over 

$3000 and expenditures under $2500. An epsem is used to select one 
of the N persons. Then from the definition of the probability of an event 
we see that 

Pr(the selected person has over $3000 income) 

Pr(the selected person has over $3000 income 
and expenditures under $2500) 

Pr(the selected person has expenditures under 
$2500 if the.selected person has over $3000 

income) 


N 

^A*B* 

N 


Na^ 


Exercises 

5.2. A population consists of M blocks, the /th of which contains > 0 
houses. A house is selected by first selecting a block by an epsem, and then 
selecting a house from the selected block by an epsem. What is the probability 
of selecting a specific household on block /? What would be the probability 
of selecting that house if one house were selected from all the houses by an 
epsem ? 

5.3. Suppose now that, instead of selecting a block by an epsem, we select a 
block so that the probability of selecting the /th block is / = 1, * * *, M, 
A = Ai + • • ' + Nm- From the selected block a house is then selected by 
an epsem. Answer the two questions asked in connection with Ex. 5.2. 

5.4. Suppose that a simple random sample of 2 elements is selected with 
replacement from the population given in Illustration 5.5. Show that 

Pr{a^ = Ai\a^ = A,) = J 

Pr{a^ = Ai\a^ - A,) = J for i A j 

6 . Some theorems on probabilities. Let Af, • • •, A% be events associ¬ 
ated with a particular operation. Then hy A\ A- ' ' * + A%, the sum 




Sec. 6 


SOME THEOREMS ON PROBABILITIES 


event, we mean the event that occurs if at least one of the A? occurs, 


27 

By 
Sec. 5. 


product event, we mean, as earlier defined m 
the event that occurs if A*, A*, • • , and A J all occur. 

Illustration 6.1. Let us suppose that M = 10 and that A, ■■■ A 
are the 10 spots in Fig. 1 . As shown by the figure, .^ 3 . and a\ ar“ 



Srnts of’>. an/; "'V ; the ele- 

”f A* I A* ///1"* ’ ® °f A*. Then the elements 

element but\ an^d L ::Lted at'le^sf ^’ne^^f ihe if / 7 

and A* will occur. On the other hand. AtA*A*At can nevw occur’ 
since no element is an element of all four of them. We call A*A*A*A* 
a null event, I.S., an event that cannot occur. The product event//* 
wi occur if and only if A^ occurs, while the product event A*A*A* 
is a null event. The comnlementarv ev^nt A* + - •'• + / is 

i‘A'+ ' 4 +'}*%°°'““'"““" “ ““ 

<» 7'k1‘ "? '■'i-'i'i.* ■ ■ ■ 'IS) > 0 

men i • ^^) > 0 Jor any integer h < K 

r. tissumption, there is at least one elementary event A- with 

probability F, > 0 common to all the events A*, ■ ■ A\. Any such 

o/he F /er"tr'T"‘^ ' ' ’’ ‘hat the sum 

01 the F, over the elementary events common to A* ■■■, At will be at 

iHv Til "" ev..‘„ <ii“ 





PROBABILITY 

% events, with Pr{AiA* 


Ch. 2 
Ai) > 0, 


28 

Theorem 3. If A*, 
then 

PriA*At- ■ ■A\)=Pr(,A*)Pr{At\Ar)- ■ ■ Pr{A%\At- ■ ■ A\^^ 

Proof. If -R: = 1, then each side of the equation is Pr{A\). Suppose 
now that the theorem is true for K=^i. Then we shall show the 
theorem to be true for Ji:=;+ 1, which will complete the proof by 
induction. This means that the theorem is proved for any posUive 
integral value of K by the following steps: The theorem is true rf ^ L 
Then putting / = 1, we show that it follows for K==2. Then the 
theorem is true if fsT = 2, and, putting; = 2, we show that it follows 
for K = 3, and so on. , 

Now, by the definition of conditional probability, treating ^1^2 • ' ' 

as one event, we have 


PriA^At 


.4?+i) 


PriAf^Mi- • 


Bv the hypothesis of the induction. Theorem 3 is true for K -j. Sub¬ 
stituting for Pr{A*A* ■ • • Af), we see that Theorem 3 is also true for 

K=i +i, which completes the proof. , • jr „„ 

Sewral events At, ■ ■ % At are said to be mutmlly exclusive if n 
element is common to 2 or more of them and are said to be exhaustive 
4 _. . , -i- A\ -^XhQ entire population. 
tLT, if A* is an event, then A* and A* (the complementary event) 

are both mutually exclusive and exhaustive. - j j; 

niustration 6.2. Let us define the event A, to be + • ■ • + ^ 
Then, in Illustration 6.1, At, ■■■, At are exhaustive but not mutually 
exclusive. At and A* are mutually exclusive but not exhaustive. The 
five events At, A*It> At, A*!*, and At are both exhaustive and 
mutually exclusive. 

Theorem 4. If A* is a subset of then 

Proof Every element in T* is also in and hence the elements in 

are exactly those in . 

The following theorem is of considerable importance in calculat g 

5. Let At, ■ ' At be mutually 
+ . . . + At, it follows that Pr{A*) = Pr^Af) + • ’ ; + PriAsL 
Proof Since A* = At + ' ' ' + At, and since At, A^, ' ' S ^ 
are mutually exclusive, the theorem follows immediately if we subjitute 
for PriA*) and Pr{At) + ' ' • + Pr{At) the sum of the probabilities of 



Sec. 6 SOME THEOREMS ON PROBABILITIES 29 

the elementary events of which they consist, using the definition of the 
probability of an event. 

Corollary 1. Let Af, • • •, be mutually exclusive, and let and 
C* be any events. Then, if + . . . + a^, it follows that 

Fr(A^C*lB^) = Fr(A*C^lB*) + • • • + Fr(A%C^lB^) 

Proof, Since A^, * * % are mutually exclusive, it follows that 
A\C'^, * * % are mutually exclusive. Furthermore, the set A^C^ 

consists of exactly the same elements as the set -f . . . A\C^\ 

or 

= A\C^ + . . . + A\C^ 

and similarly 

A*C*B* = A^C*B* + • • • + A%C*B* 

where C*A*B*, ■ ■ ■, A%C*B* are mutually exclusive, since At, At, 

• ■ At are mutually exclusive. Then 

‘ ^ Pr(B'^) 

= Pr(AtC^\B^) + Fr(AtC^\B*) f • • • + Fr(A%C*\B^) 

An alternative proof for the special case where an epsem is being use(J 
is as follows: Suppose that Af consists of elements, / = 1, • • •, K. 
Of these elements suppose that A,-, are also in C*. Then 
consists of N,:, elements none of which can be in AfC'^f ^ y), since'the 
elements are in Af, and no element of Af is also in Af, Furthermore, 
A'^C* consists of the A^ie + iVac + * • * + elements which are both 
in one of the Af and in C*. Hence, from the definition of the probability 
of an event, we have 

Pr(A*C*) - Pr(AtC*) + • • • + Pr{AtC*) 

There are several other interesting corollaries to Theorem 5. Of 
these we give the two that follow; 

Corollary 2. If At, ■■■, At are exhaustive as well as mutually 
exclusive, then Pr(A*C*\B*) --■= Pr(C*\B*) and we have 

Pr(C*|B*) = PriAtC*\B*) + ■ ■ ■+ Pr(AtC*\B*) 

Corollary 3. If At, ■ ■ ■, At are mutually exclusive, then 

PK^*C*lB*) = Pr(A*|B*C*)Pr(C*|B*) + - • ■+Pr{At\B*C*) Pr{C*\B*) 

Pi-{C*\AtB*) Pr{At\B*)+- ■ ■+PriC*\AtB*) Pr{At\B*) 





30 PROBABILITY Ch. 2 

7. Some illustrations of the uses of theorems on independent and con¬ 
ditional probability, a. Evaluation of some probabilities associated with 
simple random sampling. We have previously seen that the probability 
of any specified possible sample is 1/C^ if simple random sampling is 
being used. Let us now prove that result and others by conditional 
probability. 

(1) First, let be the event is in the sample.” Then the product 

event is the event are in the sample” 

since the product event occurs if and only if all the component events 
occur. By Theorem 3 

• • • 5*) = PriBt) Pr{Bt\Bt) • • • • • ■ 5j-i) 

Now 

Pr{Bt)==^ 

Also Pr{B^\Bf) — {n— \)1{N — 1), since the condition that is in the 
sample reduces the problem to one of selecting n — 1 elements by simple 
random sampling from a population consisting of Tg, • * *, A^. 

Similarly 

Pr{Bt\Bt- • = 

so that 

n{n — 1) * • • 1 1 

Pf(sample consists of A^, • • •, A„) = _ . (jV_ „ + 1) “ Cf 

(2) The probability that A{ is the Kth element selected for the sample 
is obtained similarly. Let fif be the event “Aj is not obtained at the ith 
selection,” i = 1, • ■ K- 1, and let B% be the event “A^ is obtained 
at the Kth selection.” Then we want to evaluate the probability of 
B^B^ • • • B%y and by Theorem 3 

Pr(B*B*- ■ ■ B*)=Pr{B*)Pr{Bt\Bi)- ■ ■ Pr(Bi\B* ■ ■ • Bl_i) 
jV- 1 jy-2 _ N-K+l 1 
TV N — 1 N — ^^+2^— iV+1 

_ f 
~ N 

b. Definition of stratified simple random sampling. Suppose that the 
N elements of a population are classified into L strata, the /th of which 
contains elements, ? ^ 1, • • *, L. Suppose also that simple random 
samples are independehtly selected from each of the strata. Then we say 
that a stratified simple random sampling plan is used. 



Sec. 7 SOME SAMPLING PROBABILITIES 31 

It is instructive to compare the numbers of possible samples of size n 
when simple random sampling and stratified simple random sampling 
are used. 

There are Cf possible samples of size n from a population of size N 
if simple random sampling is used. 

Let « = «! + • • * + Then, from Theorem A.2 of the Appendix, 
there are possible stratified simple random samples 

consisting of elements from stratum 1, elements from stratum 2, 
• • fij^ elements from stratum L. 

Clearly, since each possible stratified simple random sample of 

== «! + • • ’ + elements is also a possible simple random sample of 
n elements, it follows that 








a 


and, in fact, since every sample of size n will, for some values of Wj, • • •, 
consist of elements from stratum 1, Wg elements from stratum 2, 
• • % ri]^ elements from stratum L, we have 




C5^ = 




where the summation is over all such that > 0 and 

+ * * • + 

We have already seen that if a simple random sample of size n is selected 
then the probability that some specified element, say is in the sample 
is njN. If that element is classified into theyth stratum, and if elements 
are selected from that stratum in the stratified simple random sample, 
then the probability that is in the sample becomes 
Furthermore, if proportionate sampling is used, i.e., if the sampling 
ratio UjlNj =f^ j — then — fNj and hence 

” A, + • • • + ” A 


Thus, if proportionate stratified simple random sampling is used, then 
the probability of a specified element being in the sample is the same as 
when a simple random sample of the same size is selected; whereas, if 
the sampling is disproportionate, then, of course, this will not be so in 
any stratum for which ^ njN. 

c. Cluster sampling. By a cluster we mean a set of elements that are 
treated as a single element for purposes of selecting a sample. For 
example, the people within a family constitute a cluster if we select 
families, the people or families within a block constitute a cluster if we 
select blocks, the people or families or farms within a county constitute 





32 PROBABILITY Ch. 2 

a cluster if we select counties. Thus, clusters may be small or large. 
They are treated as units in selecting a sample. 

We have already derived the probabilities needed for simple random 
sampling, stratified simple random sampling, and systematic sampling of 
clusters. Once the clusters have been selected, however, the selected 
clusters may themselves be treated as a population from which a sample 
is selected. When this is done, we are using cluster sampling with 
subsampling. 

Let the clusters be denoted by ' • •, A^^ and let the cluster Ai 

consist of the elements A^^, ^ Suppose that 

we select a simple random sample of m clusters. Then the probability 
that we select, say, A^, • A^ is 1/C^. Once we have selected A^, • • •, 
A^ we may treat these clusters as a single population and select a simple 
random sample or systematic sample from it, or we may treat the selected 
clusters as substrata and select a stratified simple random sample or strati¬ 
fied systematic sample from the strata. We then use Theorem 2 (p. 27) 
to calculate the probability of obtaining a specified sample of elements. 
This probability is obtained by multiplying 1/C^ by the conditional 
probability of obtaining a sample of elements from the selected clusters. 

Suppose that m of the M clusters are selected by simple random sampling 
and that from the selected clusters a subsample of elements is selected, 
again by simple random sampling. Then the probability of selecting a 

specified element is 
^ m Uj 

mWj 

where is the total number of elements in the;th cluster. In the special 
case where Uj/Nj = njN, the probability of selecting a specified item is 
mnjMN = njN, the same probability of selecting a specified element as 
existed for simple random sampling and for proportionate stratified 
sampling. The probability of obtaining a specified pmr of elements 
differs, and it is this difference which has a bearing on the relative accuracy 
of the three sampling designs. 

For simple random sampling the probability of obtaining a pair of 

elements is ^ For stratified random sampling the probability 

N(N— 1) 

depends on whether both elements are in the same or different strata even 
if proportionate sampling is used. If both elements are in the yth 

stratum, the probability is whereas if they are in theyth and 

kth strata, the probability is ^ For cluster sampling, if both elements 


Sec. 8 


THE TABLE OF RANDOM NUMBERS 


33 


are in the same cluster, the probability is — where A is the 

size of cluster and the size of sample selected from that cluster, whereas 
if the elements are in the yth and kth clusters, the probability is 
m{m — 1 ) Uj 

m{m-\)n,¥; 

8. Methods of achieving probability (or measurable) sampling plans. 
The table of random numbers. By a probability (or a measurable) sampling 
plan we shall mean a sampling plan where the elements are selected with 
known probabilities. We have already indicated the importance of a list 
in selecting samples; i.e., when the list is known, each of the elements of 
the population is identified by a number and if that number is selected 
then the corresponding element of the population is in the sample. For 
this reason, in sampling, we often want to select a sample of numbers. 
If we can use a probability selection method for obtaining a sample of 
numbers, then our sampling plan is a probability plan for obtaining a 
sample from the list or population represented by it. The most important 
and frequently used tool in obtaining a probability sampling plari is the 
table of random numbers. 

Tests have been performed on the tables of random sampling numbers 
in common use that make it reasonable to assume that the following 
statements are true for all practical purposes: 

(1) Each number is the result of performing an epsem. 

(2) The selection operations are independent. 

It follows from the above two statements that each possible pair of 
numbers in the table may be interpreted as the result of performing an 
epsem to select one number from 1 , • • % 100 , where 100 is said to be 
selected if and only if we obtain 00 . 

The results of 100 performances of the operation can also be used to 
obtain the results of 100 tosses of a true coin by referring to an odd 
number as heads and to an even number or zero as tails. 

Problems on the table of random numbers 

(1) Select an integer from 1, • • •, 113 by an epsem. Two methods are in 
common use. 

Method a. Choose any 3 columns of a table of random numbers* (p. 117, 
Vol. IX say columns 5, 6, and 7. As we go down these columns, the numbers 
are 422, 044, so that 44 is the selected integer since it is the first integer found 
between 1 and 113 inclusive. 

Method b. Divide the first number found by 113, Now 422/113 = 

* M. G. Kendall and B. Babington Smith, Tracts for Computers, No. XXIV, 
Cambridge University Press, second edition, 1946, p. 8. 



34 


PROBABILITY 


Ch. 2 


Then 8 3 is the selected integer. One precaution must be taken in using method b. 
The basis of this method is that, since all the numbers 001, * • *, 999, 1000 
(1000 is denoted by 000) are equally likely in columns 5, 6, and 7, the remainders 
yielded by these numbers will occur with equal probability. Yet, we would 
obtain 113 if any of the 8 numbers 113, 226, • • •, 904 occurred, whereas 1 could 
occur if any of the 9 numbers 1,114,227, • • -,905 occurred. Thus, the numbers 
1, • • •, 113 would not be equally probable. We can avoid this difficulty by 
passing by the numbers 905, • • *,999 until a smaller number is reached. In 
this way, while preserving the epsem we reduce considerably the number of 
numbers to be passed by. 

(2) Select three integers from 1 to 113 by an epsem. Let us first note that 
the statement of the problem needs clarification. Shall we permit an integer 
to occur more than once among the three selected ? Since an epsem was defined 
in terms of selecting one element, how shall we interpret the selection of more 
than one element? Let us answer the second of these questions first. If we 
wish to select several elements from one population we can either explicitly or 
implicitly define a second population whose elements are the possible selections 
of several elements from the first. Then we select one element from the second 
population by a selection method that gives to each of its elements the desired 
probability that it be selected. Another method is that of choosing one number 
at a time until the sample is obtained. Suppose, now, that we do not wish any 
of the integers from 1 to 113 to be selected more than once. Then after selecting 
44 on the basis of columns 5, 6, 7 we continue on down and reach 9, and con¬ 
tinuing in columns 8, 9, 10 we reach 67. Thus, these are the three selected 
numbers. If we had been willing to permit a number to be selected more than 
once, and if we had come upon 44 again as the second number as well as the 
first, we would have selected it. But, if we did not wish to include the same 
number more than once, we would pass over 44 the second time we came upon 
it just as though it were a number greater than 113. 

Exercises 

8.1. Toss 2 coins 100 times. For each toss define the possible results to be 
0, 1, or 2 heads. Compute the relative frequencies with which each of these 
possible results occurs in the first K tosses, = 1, 2, • • *, 100. Is there any 
tendency of these relative frequencies to become stable? What probabilities 
would you associate to each of the 3 possible results ? 

8.2. Open a book 100 times, trying to do so in such a way that you do not 
know at what page you will open the book. If the next to the last digit (for 
pages 1, 3, 5, 7, 9 we assume this number to be 0) of the right-hand page number 
is 0, 2, 4, the result is 0, otherwise it is 1. Calculate the relative frequencies of 
0 and 1 for each of the first K performances of this operation, AT = 1, * * *, 100. 
Is there any tendency of these relative frequencies to stabilize? About what 
values? If the relative frequencies do not seem to stabilize what conclusions 
would you draw ? 

8.3. Use a table of logarithms to 4 or more places. If the last digit is odd, 
say that the result 1 has occurred. Otherwise the result 0 has occurred. Calcu¬ 
late the relative frequencies of 0 and 1 for each of the first K performances, 

= 1, • * *, 100. Is there any tendency of these relative frequencies to stabi¬ 
lize? About what value? Repeat this exercise, using the next to the last digits 
of the logarithms. 



Sec. 8 the table OF RANDOM NUMBERS 35 

8.4, Which of the following ways would you regard as most closely approxi¬ 
mating an epsem: (a) by opening a book at various points and using the page 
numbers; (b) by using a table of logarithms; or (c) by writing the numbers 
on identical-appearing cards or chips, shuffling them very well, and selecting 
blindfolded? What are the dangers of employing such methods instead of a 
table of random numbers ? 

8.5, Suppose now that in using a city directory you number all the listed 
dwelling units and use an epsem to select one of these dwelling units. Is this 
the same as using an epsem to select one of the dwelling units in the city? 

8 . 6 , Suppose that, on the basis of an up-to-date and complete listing of the 
population, a sample of say 10,000 is selected by means of an epsem and ques- 

Of these 10,000 questionnaires 
3000 are returned immediately, 1500 more after one follow-up letter and 500 
after an additional follow-up letter, or 5000 in all. Suppose that these 5000 
are kept in a file in the order in which they are received. Several tests are made 
comparing the results of these 5000 questionnaires and their serial numbers to 
those of the entire 10,000 questionnaires. None of these tests contradict the 
assumption that the 5000 questionnaires were obtained by using an epsem 
Should these 5000 questionnaires be treated as though obtained by means of 
an epsem from the same population as that from which the 10,000 elements 
were selected? What is the danger that the 5000 who returned their question¬ 
naires were essentially selected by an epsem from those who felt strongly about 
the subject matter of the questionnaire? Can we by an essentially internal 
analysis of a sample ever obtain satisfactory evidence that it was selected by an 
epsem from a specified population? Discuss the following: Suppose that a 
population is expected to have somewhat more males than females and we want 
to estimate the proportion of males in the population by choosing KX) persons 
by means of an epsem and using the proportion of males among them. After 
the 100 pieces of data are recorded someone loses them, and, being afraid to 
adrnit it, replaces the data by tossing 100 times a coin having probability of 
heads equal to .51 and recording male when heads occurs, female when tails 
occurs. 



APPENDIX 


Combinations and Permutations 


Combinations and permutations are introduced so that we can have 
convenient tools for specifying certain sets of elements and for counting 
the numbers of elements in those sets. The discussion is brief since these 
topics will be known to many of the readers of this volume. 

Theorem A.l. Suppose that one operation has K possible results A^, 

• • •, Aj^. Suppose that if A^ occurs when the first operation is performed, 

then a second operation has M,- possible results , / = 1, 

• * X. Then, the number of possible results of performing these two 

operations is + * • • + 

Proof The possible results of performing these two operations may 
be denoted by: 



* 


A2B21, 

^2^22> * 

■ *) ^2^2M* 



• •, A^Bj^J^jr 


where the ith row above lists all the possible results of both operations 
such that A^ occurs when the first operation is performed. Counting the 
number of possible results listed, we see that there are Mj + • • • + Mj^ 
possible results. 

Corollary. If = Mg = • ♦ • Mj^ = M, i.e., if the number of 
possible results of performing the second operation is the same, whatever 
the result of the first operation, then the total number of possible results 
of performing both operations is KM. 

The proof consists in replacing M,- by M in Theorem A.l. 

Illustration A.l. (a) Suppose that a population consists of 3 blocks, 
Ai, A 2 , As, and that A^ contains 2 families, B^j^, B^^; ^2 contains 3 
families, B^i, B 22 , ^ 23 5 ^3 contains 1 family, Then there are 6 

possible results of performing the 2 operations of first selecting a block 
and then selecting a family from the selected block. These possible 
results are: 

A^B^^, A^B^2i ^2^21’ ^2^22’ -^2^23’ 3’^d ^35^^ 

. 36 




App. COMBINATIONS AND PERMUTATIONS 37 

{b) Suppose that a population consists of 10 elements and that first 
one element is selected and then from the 9 remaining elements a second 
element is selected. There are 10 possible results of the first operation, 
so that Whatever result occurs for the first operation, there are 

9 possible results of the second operation, so that == • • • = = 9. 

Hence, there are 90 possible results of the pair of operations. 

It will simplify the notation if, in the generalization of Theorem A.l 
above to N operations, we assume that whatever may be the results of the 
first /— 1 operations, the /th operation has a constant number of 
possible results. A similar theorem holds when the number of possible 
results of the ?th operation depends on the actual results of performing 
the first i~ 1 operations (/=!,• • *, N). 

Theorem A.l. Let us suppose that the ith of N operations has possible 
results no matter what may be the results of performing the first i — 1 
operations (/ = 1, • • •, N). Then the number of possible results of per¬ 
forming all operations in a specified order is • • • A.y. 

Proof If A = 2, this theorem has already been proved in the corollary 
to Theorem A. 1. Then the results of performing the first two operations 
may be considered to be those of performing one (complex) operation 
with results so that we obtain {K^K^)K^ = possible results 

for performing the first three operations. Continuing, the results of the 
first N~ 1 operations may be considered to be those of performing a 
complex operation having possible results, so that by the 

corollary to Theorem A.l the number of possible results of performing 
all N operations is {K^K^ • • • K^r__^)K^ = 

Definition of permutation. Let a set consist of N elements • • •, 

An> Then an w-permutation consists of any n of those elements arranged 
in a specified order. An A-permutation of A elements is called a 
permutation. 

Thus, two w-permutations of A^, • • •, Ay will differ either if they 
contain different elements or if they contain the same elements arranged 
in different orders. In Illustration A.l we have proved the following 
theorem: 

Theorem A.3. The number of n-permutations of A elements is 
N(N- 1). . . (AT- „ + 1) ^ 

(A — n )! 

Definition of combination. Let a set consist of A elements A^, A^, • • •, 
A^. Then an /i-combination consists of any n of these elements. 

Thus, two /i-combinations of A^, • • •, A^ will differ only if they do not 
contain exactly the same elements. 





38 


PROBABILITY 


Ch. 2 


Illustration A.2. (a) The number of 2-permutations of A, B, C is 6. 
They are AB, AC, BA, BC, CA, CB. 

(b) The number of 2-combinations of A, B, C is 3. They are AB, AC, 
BC, The permutations AB and BA are distinct permutations, but they 
are two ways of stating the same combination. 

Theorem A.4. Let n elements be selected without replacement from N 
elements and let two selections be considered different if they are distinct 
n-combinations. Then the number of possible distinct selections, or 
n-combinations of N elements, is 

rN -(iV-^+ 1) _ 

' n\ n\{N-n)\ 


Proof We have already seen (Illustration A.l) that there are 
p[(jy[—.\Y . 1 ) possible selections of n elements from N 

elements without replacement. Consider a specific selection of n elements. 
There will be n\ possible arrangements or ^-permutations of these n 
elements, each of which will occur among the N{N — 1) • * * (A — « + 1) 
possible selections or ^-permutations of N elements, i.e., each distinct 
combination gives rise to nl permutations. Thus, 

n\ • (Number of combinations) = Number of selections 

- A(A- 1)* • *(A--n+ 1) 


or 

N(N~ !)• * -{N-nA- 1) 

Number of combinations ==- 

nl 

which completes the proof. 


REFERENCES 

(1) H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 

1946. 

(2) W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I, 

John Wiley & Sons, New York, 1950. 


CHAPTER 3 


Fundamental Theory—Random Variables, 
Expected Values, Variances, Covariances, 
and Convergence in Probabihty 


L Introduction. In this chapter we establish the theorems we shall use 
in deriving the expected values, variances, and covariances that are needed 
for the proofs given in the subsequent chapters. Of these the more 
important are the following: (1) The expected value of a sum of random 
variables is the sum of their expected values (Theorem 5, Sec. 3, p. 48). 
More generally, the expected value of a linear combination of random 
variables is the same linear combination of their expected values (Theorem 
6, Sec. 3, p. 49). (2) The expected value of a random variable is the 
expected value of the conditional expected value of that random variable 
(Theorem 14, Sec. 5, p. 61). 

Theorems 11 and 12 of Sec. 4 (pp. 56 and 57) on the variances and 
covariances of linear combinations of random variables greatly simplify 
the problem of deriving variances in a number of practical samplin- 
designs. Theorems 15, 16, and 17 of Sec. 6 (pp. 65 and 68) on 
conditional variances simplify greatly the development of the variance for 
a multi-stage sampling design in terms of its components. 

2. Random variables—mathematical expectation, a. Definition of ran¬ 
dom variable. The values of one or more variables are usually associated 
with the elements of a population. For example, if a population consists 
of N families • • •, then associated with the family are values 
of such variables as the age of the head of the family, the number of 
children under 18 years of age, and the annual family income. If the 
elements A^, • • Aj^ of the population are farms, then associated with 
Aj, the farm identified by /, are such variables as the number of acres, 
yields of different crops, income, and expenditures. If the elements 

^ 2 ? * ■ % ^f Ihe population are the TV* possible samples according 
to some sampling plan (see Ch. 2, Sec. 4), then with each A, is associated 
the value of the estimate or estimates that will be obtained if that possible 

39 



40 EXPECTATION AND CONVERGENCE Ch. 3 

sample is selected to be the sample. The value of any variable, U, that 
is associated with will be called U^. Thus, in the above discussion U 
might stand for age of the head of the family and would be the age 
of the head of the family identified by i, or [/ might stand for estimate 
of average family income of this population and might be the value of 
that estimate for the /th possible sample. Then, if the probability of 
selecting A^ is and if we denote by u the value of U that we obtain 
when one of the elements A^, • * % is selected, it follows that u has 
possible values Ui, • * *, U\ and 

Pr(u = U,) - Pr(A,) = P, / = 1, • • •, A 

where Fr(u = is read “the probability that u = It should be 

noted that some of the values f/j, same numer¬ 

ically but they are distinguishable by the different subscripts that associate 
them with different elements of the population. 

In a case such as that described above, we call u a random variable, 
i.e., u is called a random variable if it has a finite number of possible 
values C/i, C/ 2 , * * *, and if with each possible value <7,- there is 

associated a probability 

Pr{u = C/,) = Pi / - 1 , • • % A 

where 

Pj > 0 and Pi + * * ■ + 1 

We can treat any real single-valued function defined on a finite 
population as a random variable; the probabilities of the possible values 
of the function are the probabilities of selecting the corresponding 
elements of the population. 

It will be noted that we do not require that the probabilities P^ all be 
positive. Thus, some of the so-called “possible values” of u may have 
zero probability of being assumed. 

Illustration 2.1. Suppose that N = 5 and that to A^, A<^, A^, A^, and A^ 
correspond the values C/^ = 2, U2 = 2 , = 1 , C/4 = 6, C/5 = 3. 

Then 

Pr(u ^ Ui) = Pr(Ai), and Pr(u = 2) = Pr(^*) 
where A'^ consists of A^ and A 2 so that 


Also 


Pr(u =:=2) = Pi + P 2 
Pr(u < 4) = Pr(^*) 


where consists of A^, A 2 , and A^ so that 

Pr(u < 4) - Pi + P 2 + ^5 




Sec. 2 


RANDOM VARIABLES 


41 


Suppose that, in addition to the values of u above, the values = 8 , 
Eg = 3, F 3 = 5, F 4 = 1 , F 5 = 3 are also associated with the A^. Then, 
if w< 4, it follows that A'^ occurs, where A'^ consists of A^, A^, A^; if 
^ < 6 , it follows that B* occurs, where B* consists of A 2 , ^ 3 , ^ 4 , and A^; 
if both « < 4 and v ^ 6, then A*B* occurs, where A^B"^ consists of 
only A 2 and A^, since only if A 2 or A^ occurs will we have both « < 4 
and V < 6 . Hence, 


Fr(u < 4, n < 6 ) ^ Pr{A^B^) ~ -|- P 5 

Also we can evaluate the conditional probabilities that these random 
variables assume certain possible values. For example, 


Pr{u < 4|n < 6 ) = Pr(T*|P*) = 


Pr(A^B*) 

Pr(B^) 


P2 + P, 

^2 + Fg + P4 + Pg 


We now need to define independent random variables. 

Suppose that are k random variables and that has possible 

values ( 7 ^ 4 , U^ 2 ^ • • % <7=1,* • %k. Then the random variables 

Wi, • • •, Mfc are said to be independent, if and only if, 

Pr(wi = U 2 = ^ 

- Pf(Ml - Pr(u2 - U2i) • • • Pr(uj, = Uj,,) 

for all possible values of Mj, • • ■, u,,, where is any one of the possible 
values of d ^ I, • • *, k. 

Suppose now that two elements are being selected from a population 
consisting of f/i, • • % U^j. Denote by Mi and U 2 the results of the two 
selection operations. Now, if we select at random with replacement 
(^ee Ch. 2, Sec. 4c), it follows that 

Pr(Mi - U,, U 2 - U,) - PKmi = Ud Pr{u2 - £/,) 

whereas if we select at random without replacement (see Ch. 2, Sec. 4m), 
then 

Pr(Mi = C/,., Ma == U^) = 0 

_ 1 __ 

N(N~- 1) 

so that in the latter case 

Pr(ui = Mg = Uji) ^ Pr(ui = Pr(u2 — U^) 

These results hold not only for two but also for n selections so that, 


if i^J 
if i ^ j 











42 


EXPECTATION AND CONVERGENCE 


Ch. 3 


whereas the random variables that are the results of the individual selec¬ 
tions at random with replacement are independent, those that are the 
results of the individual selections at random without replacement are 
dependent. 

In Feller (2), p. 87, and Cramer (1), p. 162, will be found simple 
examples showing that even if the random variables • • *, are 
pairwise independent, i.e., and are independent for all i ^ j\ it may 
still be true that the random variables • • *, % are dependent. How¬ 
ever, from the definition of independent random variables, it is not 
difficult to prove that if • • *, are independent then any subset, say 
Wi, * * Uf,, h < k, of Wi, • • % w* are independent. To see this for, say, 
^ 3 we note that by Ch. 2, Sec. 6, Theorem 5 (p. 28) it follows that, if 
Wj, M 2 » and are independent, and if and are particular values of 

and Wg, and takes on possible values = 1, 2, • • •, Wg, then 

m2 

Pr{ui = «3 = = 2 Pr{uy = C4,,) Pr^u^ = t/ 2 ,) Priu^ = 

j 

= Pr(ui = i/i,,) P/-(«3 = t/si.) 

since 

m 2 

2i’/-(«2= i/2,)= 1 

i 

A generalization of the term random variable that will be helpful is the 
term random event. 

If, when an operation is performed, one of the K exhaustive and 
mutually exclusive events (see Ch. 2, Sec. 6) • * *, must occur, 

we call the result of that operation a random event, and denote it by a*. 
We call A*, • • •, A% the possible states of the random event. 

If Pr{A^) =- Pi we will write Pr{a^ = Af) = P,-, where, if a^ is a 
random event and Af is one of its possible states, the statement = Af” 
is to be read ''a* takes on the state Af.'' 

Thus the result of performing a selection operation to select one of N 
elements A^, • • ♦, A^^ of a population is a random event a* having possible 
states A^, • • % A and 

Pr(a^ = Ad - Pi 

where P,- is the probability that Ai is obtained when the selection operation 
is performed. 

Illustration 2.2. (a) One of 5 blocks * • *, C 5 is to be selected. 
The result of that selection we call a random event with 5 possible states; 
these are “Cj is selected,” • • •, “C 5 is selected.” 

(b) Two of the 5 blocks are selected by an epsem. Then the result is 
a random event having 20 possible states each of probability 



Sec. 2 RANDOM VARIABLES 43 

(c) A sample is selected. Numerical information is obtained for the 
sample and an estimate is calculated. Then, the result of the selection 
is a random event, and, as previously mentioned, the estimate is a random 
variable. 


Tlie definition of random event includes that of random variable if by 
the states of the event we mean the taking on by the random variable of 
its possible values. 

Let us consider how the random event, and w, the value of some 
variable for the event, are related. 

First, although ci^ need not be numerically valued, u must be, according 
to the definitions we have given. The random event may be the selection 
of a block or a house or a person or a set of them; but the random 
variable must refer to something like the number of families on the block, 
the income of all persons living in the house, or the age of the person. 

Second, a random event may and often will determine the values of 
several random variables. If we denote by the number of persons, by 
Vi the total income, and by the total expenditures of the /th family 
Ai, then the selection of the /th family implies that 

u =- Ui, V = Vi, and w Wi 

so that 

Pr{u Ui, V = Vi, w = Wi) = Pr(a* A,) = 

Third, sometimes the elements of the population may be taken to be 
numbers. If we wish to estimate the average size of family, then we may 
say that the elements of the population are C 4 , C 4 , • • *, where Ui 
is the number of persons in the /th family. However, it should be kept 
in mind that the probabilities of selection are unaffected by choosing 
to represent an element by the values of certain variables for that 
element. 

Fourth, just as we are interested in the probability of a random event 
which consists of several of the elements of sV, so we are concerned with 
the probability that u takes on one of its possible numerical values. Let 
A^ be a subset of the possible values of u. Then, since the values of u 
are determined by which of the Ai are selected, it is clear that 

Pr{u8Al) Pr{a^eA^) ( 2 . 1 ) 

where £ is read “is an element of,’^ and A'^ consists of ail elenoents of ^ 
such that if one of these elements is selected then u takes on one of the 
values contained in A^. 





44 EXPECTATION AND CONVERGENCE Ch. 3 

Let us note that if u^, * • *, are random variables, then functions such 
as those given below will also be random variables. For example, if 

-- 

and k 

o i 

— . .. . 

k- 1 

then M, cu (where c is a constant), s, s/u are all random variables. To 
see this, let us recall that to define a random variable we need only state 
its possible values and their probabilities. To find the possible values of 
a function of the A: rando^n variables Wi, • * •, we must give to • • •, 
all their possible values. The possible value of the function that occurs 
when * * *’ then has probability 

Pr(u^ =. U^.^, • • •, - Uj,J. Usually, Eq. 2.1 is used to evaluate the 

probabilities of the possible values of functions of random variables. 

niustration 2.3. Let have the possible values 0, 1, let u.^ have the 

possible values 1, 2, 3, andletPr(wi = /, U 2 =j) = ~ 0, l,y — 1, 2, 3. 

Then, if we define u — Ui U 2 it follows that u has possible values 
0 + 1 , 0 + 2 , 0 + 3 , 1 + 1, 1 + 2 , and 1 + 3, or 1, 2, 3, 2, 3, 4, or more 

concisely 1, 2, 3, 4. Also Pr(u = 1) = Poi> = 2) — Pq 2 + Fii» 

Pr{u = 3) = Po 3 + P 12 , and Pr{u = 4) = F 13 . As an exercise, determine 
the possible values and probabilities of U 1 U 2 , and Wi/w 2 - 

b. Definition of mathematical expectation. Before discussing the 
intuitive meaning of mathematical expectation let us define it and train 
ourselves in its computation. 

If w is a random variable with possible values Uj, * * *, Ujy and prob¬ 
abilities Pi, • • •, P,v, then the mathematical expectation or expected 
value of u is 

N 

Eu = PJJ^ + + • • • + PnUn = IPiUi 

i 

Thus, to calculate the expected value of w, we multiply each possible 
value by its probability and add the products thus obtained. 

Illustration 2.4. Suppose that u can take on the values 1 and 0 with 
probabilities J and Then 

Eu — l(i) + 0 ( 4 ) — i 

niustration 2.5. Suppose that and Ws are independent, takes on 
the values 0, 1 with probabilities ^ and f, U 2 takes on the values 1, 2, and 3 
with probabilities J, and and w — Wi + M 2 * Then, as shown in 



Sec. 2 RANDOM VARIABLES 45 

lllustrcition 2.1, u has possible values 1, 2, 3, and 4j and, using the 
assumption of independence, we have 

Priu = 1) = H = * 

= 2) = H + H = i 

Pr(u = 3)^n + H = H 
Pr{u = 4) = f I = i 

Hence 

Eu — 1 (^ 14 ) 20 + 3(^JL) + 40 = 2[|: 

As exercises compute Eu^u^ and E{uju 2 ) this illustration by determining 
possible values and probabilities, and using the definition of independence. 

Dlustration 2.6. A population contains M elements, of which have 
the value z = 1, • * % K; -f- • . . ^ Qj^g element of 

the population is selected by an epsem (Sec. 4, Ch. 2). Let us find the 
mathematical expectation of the value associated with the selected element. 
Since an epsem is used, the probability of selecting an element that has 
value Ui associated with it is MJM. Hence, if u denotes the value 
associated with the selected element, it follows that 


M. 

Eu = y ^ U, 

i M ^ 


If the element of the population is selected by a method that gives prob¬ 
ability to the yth of the elements that have the value C/^, then 

Eu ^ 

i 

where 

Pi = iPi, 

j 


since is the probability that u takes on the value C/, (Ch. 2, Sec. 6 , 
Theorem 5, p. 28). 


We now consider the intuitive meaning of expected value. Suppose 
that an operation is performed N times and that C/,- occurs times, 
+ • • • + = iV. Then, the average value of u in these N per¬ 

formances is 




NkUk 


N 




+ 


N 


Ur 


where NJN is the relative frequency of the occurrence of in the N 
performances. As the number, A, of performances increases, we expect 



46 EXPECTATION AND CONVERGENCE Ch. 3 

the relative frequencies N^IN to come closer to the probabilities so that 
u will come close to Eu as we have defined it. Since we will be selecting 
samples by methods, such as the epsem, that assign known probabilities 
to each of the possible values, it follows that we can calculate Eu and 
thus learn to what value the average result of selecting samples by this 
same method would be expected to tend. (This result is obtained in 
Sec. 7.) 

Let us prove two simple theorems about expected values. 

Theorem 1. If the possible values of the random variable u are non¬ 
negative, le., if Ui > 0, / = 1, ‘ % N, the expected value of u is non¬ 

negative. 

Proof. Since P^, • • •, are non-negative, it follows that 

Eu - 

i 

is a sum of non-negative terms, and hence 

Eu>0 


Theorem 2. Iff(u) and g(u) are two functions of u, where the possible 
values of u are U^, ■ • such that 

f{U,)<g{U,) (2.2) 

then 

Ef(u) < Eg(u) 

Proof. Since 

Eg{u) - Ef{u) = -fm] 

i 


the theorem follows from Eq. 2.2 and the fact that probabilities are 
non-negative. 

c. Biased and unbiased estimates. Let u be an estimate and let U be 
the quantity we wish to estimate by u. Then, since the value u takes on 
is determined by the particular sample that is selected, it follows that u 
is a random variable. If Eu = U, we call u an unbiased estimate of U. 
Otherwise we call u a biased estimate of U, and refer to the dilference, 
Eu — U, as the bias. 

Then, if u has possible values • • •, with probabilities Pi, • • •, 
P^, it will follow that u is an unbiased estimate of f^PfJi. 

It is not necessarily an advantage to use unbiased estimates, since it 
has been found that often they are not as good as biased estimates in the 
sense that the biased estimates may tend to come close to the quantities 
that one wishes to estimate with higher probability than the unbiased 



Sec. 3 THEOREMS ON MATHEMATICAL EXPECTATION 47 

estimates. We delay any further discussion of biased and unbiased 
estimates to Sec. 7 of this chapter. 

Exercises 

2.1. (a) A population consists of the 5 elements A-^, A^, A^, A^, A^ having the 
values 1,1,3, 5, 10, respectively (say Ai represents the fth family and the associ- 
ated values are sizes of family, or the are businesses and the associated values 
are incomes). One of the elements is selected by an epsem and the associated 
value IS taken to be an estimate, u. What is the expected value of w? 

(h) Select 100 samples of size 1 from this same population, using a table of 
random number, Compare the average value of the first 10 , first 25, first 50 
and all 100 samples with the expected value. 

2 . 2 . Repeat both parts of Ex. 2.1 when the element is selected so that the 
probabilhies of selecting A^, A„ A^, A„ and A, are .2, .4, .1, .1, and .2, respec- 
tively. Of what quantity is u an unbiased estimate? 

2.3. Let u have possible values 1 and 0 with Pr{u = 1 ) = p, and Pr(u = 0) 
~ Q — \ P. Calculate Eu, EiP^ Eu^, and Eu\ where / > 0 . 

2.4. Let u have possible values 1, 2, 3, 4, 5 with probabilities i i 

Calculate Eu. Of what quantity is u an unbiased estimate ? 

2.5. Let u have possible values - 1, 0 , 1 with probabilities J, J, J. Calculate 
Eu. 

2.6. In Ex. 2.4 calculate Eu^ and Eu^. 

2.7. In Ex. 2.5 calculate Eu^ and Eu^. Of what quantities are and 
unbiased estimates ? 

3. Some theorems on mathematical expectation. The following 
theorems are helpful in computing expected values, since they reduce the 
calculation of the expected values of relatively complicated random 
variables to the calculation of the expected values of simpler random 
variables. 

Theorem If c and d are constants and u is a random variable, then 
E{cu + (7) = cEu + d 

Note that this theorem implies that Ed = d. 

Proof If the possible values of u are with probabilities 

* * % Pn> then cu d has possible values cU^ d, i ~ I, • • A 
with probabilities • • •, P^r so that 


E{cu A-d)== ^PiicU, + = c|p,C/, + d^Pi 

i i 

= cEu + d 

Let u and v be random variables, u having possible values Li, • • *, f/y, 
V having possible values V^, •. • •, Vj^, and Piiu = v = V,) = P,’, 
1=1,- • ■, N, j = \, ■ ■ ■, M. We consider the occurrence of both t/,. 





48 EXPECTATION AND CONVERGENCE Ch. 3 

and to define an elementary event with probability / = 1 , • * *, 

. • •, M, so that there are NM elementary events, some of which 
may have probability zero. Let Af be the event u = U^. Then Af 
occurs if and only if A^^ or A^^ or • • • or A^j^ occurs. Hence 

Pr{u - C,) - Pr{At) - P^ + * * * + = Pi- (3-1) 

M 

where P,-. is, by definition, Similarly 

Pr{v --= V^) - P.,. (3.2) 

where 

N 

P-i = iPii 

i 

Theorem 4, If u and v are random variables then 

i V 

E{u v) — Eu Ev 

Proof. From the definition of expected value, we have 

N M 

E(u + ^) = 2 iPiiiU, + K,) 
i j 

N M N M 

= 2 + 2 IPiiV, 

i j i j 

N M 

= IPi-Ui + lP.,v, 

i i 

— Eu -j- Ev, from Eq. 3.1 and 3.2 above. 

Using Theorems 3 and 4, we proceed to prove two generalizations that 
are very important. 

Theorem 5. The expected value of a sum of random variables is the sum 

of their expected values. r 

Proof Let %, • • •, w,. be random variables and let u = 2^?- Then 

we want to show that 

Eu ^Eu^ 

i 

We prove Theorem 5 by induction, i.e., we prove that if the theorem is 
true when r = 1 , then it is true when r — 2\ and if it is true when r — 2 , 
then it is true when /* — 3 ; and so on; so that the theorem is proved 
for all positive integral values of r. The theorem is an identity if r = 1. 
In Theorem 4 we have proved the theorem to be true for r == 2. Now 
suppose that the theorem is true for all y, 7 1 , ‘ * *, 

V ="2 then by the hypothesis of the induction we have Ev -- 2 



Sec. 3 THEOREMS ON MATHEMATICAL EXPECTATION 49 
Also, u ~ V Uf and, by Theorem 4 , we have 

Eu=-Evi- EUr 

which completes the proof. 

Theorem 6. The expected value of a linear combination of random 
variables is the same linear combination of the expected values of these 
random variables. 

Proof Let • • •, w,. be random variables and let c., • • \ c be 
constants. Let 

r 

“ = ICiUi 

i 

be the linear combination of %, * « *, u^. Then we want to show that 

Eu = fciEUi 

By Theorem 5 

r r 

E%CiUi = '^EctUi 

By Theorem 3 ^ 

Ec^Ui = CiEUi 

Combining these results, the proof is completed. It should be noted that 
since Wj, • ‘ w,. are any random variables they may be functions of 

other random variables, i.e., if u, ^ flv^, ■ ■ where 

are random variables, and if/(*;„ • • ■ • ■, nj, then 

■^(*’ 1 . ■ • % = 2CiEfi{v^, ■ • •, V„). 

i 

Illustration 3.1. (a) Let «i, a^, and be random variables with 
Eu, = /, and Eu\ = 2i\ i = 1, 2, 3. Let a = |/a,. Then, by Theorem 6 , 

3 3 ^ 

Eu - - V /2 = 14 

i i 

Also, if j; =: - af + 2ul then Ev = - Eu\ + 2£a| = 14. 

{b) Let E log Ui = and £10“* = 63 , and let a = 3 log a, + (1 7)10“' 
Then £a = 3f.i + 1 . 7 * 2 . 


From the consideration of Theorem 6 and Illustration 3.1 it will be 
clear that, when we wish to calculate the expected value of a linear 
combination a of the random variables aj, • • •, a„ we first use Theorem 6 
to reduce the problem to one of computing the expected values of the 
component random variables a„ • • •, a,. For the computation of these 




50 EXPECTATION AND CONVERGENCE Ch. 3 

latter expected values, it will often be necessary to return to the definition 


of expected value. 

Illustration 3.2. Let w, have the possible values with 


probabilities P,i, * • PiN,» ^ ^ ~ ‘ 

3 

• r, and let 

U —- 1 " * • * [ 


Then, by Theorem 6, 

(3.3) 

Eu = CiEui -f- • • • -f c^Eu^ 

and by the definition of expected value 


Ni 

EUi — / — 1, * • •, r 

(3.4) 


3 


Computing the values of Eu^ from Eq. 3.4, we substitute them in Eq. 3.3 
to obtain Eu, __ 

Further illustrations of the uses of Theorems 3-6 will be found in 
Sec. 1, 2, and 4 of Ch. 4. 

Exercises 

3.1. Let M = 4 mi + U 2 , where Eu^ = 1 and Eu^ = — 7. Calculate Eu, 

3.2. Let u be the arithmetic mean of u^, u^y * * % where has 2 possible 

values, 1 and 0 , with Pr{ui = 1 ) = Pi and Pr{Ui = 0 ) = 1 — Pi, / = 1 , * * *, N. 
Calculate Eii, What does Eu become if Pi = • * * = = P? If w is 

equal to «i + Mg + * ‘ compute Eu, Of what quantity is u an unbiased 

estimate? _ . r. • u 

3.3. Let u = u\-\- ' • • + UN, where ui has 2 possible values 1 and 0 with 

and PK% = 0) = 1 - Pi. Calculate Eu, What does Eu 
become if Pi ^ = Pa^ = P? Of what quantity is u an unbiased estimate? 

3.4. Let w = 5wi + 3 ^ 2 , where u^ has possible values 1, 2, 3 which are equally 
probable and u^ has possible values — 1 , 0 , 1 which have probabilities f, 

Compute Eu. , 

3.5. Let u be the arithmetic mean of Wi, * * *, Un, where Ui has possible values 
Ui,' * Un for all i and these possible values are equally probable. Compute 
EU. If u is the total of u^, • • •, compute Eu. Of what quantity is u an 
unbiased estimate? 

3.6. Show that, if u is the arithmetic mean of Wi, ♦ * *, and u is their sum, 
then it always follows that Eu = tiEii. 

4. Variance, covariance, mean square error, rel-variance, coefficient of 
variation. The variance, cr^, of the random variable u is defined by the 
equation 

al^E{u-Euf (4.1) 

Thus, if w is a random variable with possible values U^, • * •, Un and 

N 

probabilities Pj, • * *, Pjv, then al ~ 

i 


Sec. 4 VARIANCES AND COVARIANCES 51 

It is easy to see that > 0, and that = 0 only if (7,- - £u for all / 
except, perhaps, those having probability 0. 

The covariance of the two random variables u and v is defined by 
the equation 

= E(u — Eu)(v — Ev) (4.2) 

Thus, if is a random variable with possible values C/i, • • •, (7^ and v is 
a random variable with possible values F^, • • *, and if 


then 


Hence 


= Pr(u = C/,., V = F,) 


N M 


= 2 iPiAU, - Eu)(V, - Ev) 

i j 


The variance and covariance of random variables are very important 
in sampling theory, since measures of efficiency are largely based on them. 

Also important in the interpretation of estimates are the standard 
deviation, o’„, which is the positive square root of the variance, the mean 
square error, which is defined to be E{u — Uf, where u is the estimate and 
U is the characteristic that is being estimated, and the reUvariance, which 
is defined to be 


a 


u 


{Euf 


(4.3) 


The positive square root of the rel-variance is called the coefficient of 
variation. The relation between the variance and the root mean square 
will be found in Theorem 7 below. Inasmuch as the standard deviation, ■ 
the mean square error, the rel-variance, and the coefficient of variation 
all depend so closely on the variance, we shall not discuss them in detail 
in this chapter. However, they are often used in later chapters. 

Theorem 7. If c and d are any constants and if U = Eu, V = Ev, then 

E{u-cf:^Gl-i-{U~cf (4.4) 

i.e., the mean square error is the sum of the variance and the square of the 
bias. Hence, the minimum value of E{u—cf occurs when c=U. 
Moreover, 

E{u - c)(« ~d)=: + ((7- cW-d) (4.5) 

where (T„„ is given by Eq. 4.2. 

Proof. Since u— c = u~ V + U~ c, 

(u - cf = («- Uf + 2(« - U)(U- c) + ([/- cf (4.6) 



52 EXPECTATION AND CONVERGENCE Ch. 3 

Equation 4.4 follows from Theorem 6 and the fact that 

El{u- V){U- c) - 1{XJ- c)Eiu- U)==0 

The fact that the minimum value of E(u — c)^ occurs when c — U then 
follows from the fact that {U— cf in Eq. 4.6 is greater than 0 whenever 
U ^ c. The proof that Eq. 4.5 holds follows the same reasoning as the 
proof that Eq. 4.4 holds. Note that Eq. 4.5 is Eq. 4.4 when u ~ v and 
d. 

Corollary. If is a random variable, then 

al = Et^-{Euf (4.6) 

Eu^ > {Euf (4.7) 

{Eu^)"^^^E\u\ (4.8) 

where \u\ stands for the absolute value of u, i.e., \u\ = w if w > 0 and 
\u\ = — w if w < 0. 

Proof. Equation 4.6 follows from Eq. 4.4 in Theorem 7, with c = 0. 
To prove that Eu^ > {Euf we first note that from Theorem 1 
orj == _ jj-f > 0. The proof thus follows from Eq. 4.6. 

Finally, {Eu^f^ > E\u\ is obtained from the following: 

= E\u\^— {E\u\f from Eq. 4.6 
E\u\^> {E\u\f from Inequality 4.7 above 

and since \u\^ = we have 

Eu^ > {E\u\f 
and 

{Eu^f^ > E\u\ 

Illustration 4.1. {a) Let u have possible values t/j, • • all of which 
are equally probable. Then 

We have already seen that Eu = V. Also, Pr{u — Pr{{u~ Eu^ 

= (f7, — EuY}. Hence 

Eiu-Euf = 2Uu,~Vf 

i ^ 

We could also obtain this result by using Theorem 7 with c = 0. From 
Theorem 7 with c = 0 we have 


al = Eu^- C/2 






Sec. 4 VARIANCES AND COVARIANCES 

Since Pr{it^ = Uf) = Pr(u = V() we have 


53 


SO that 


Since 


2{.u,-Uf = 2m-NV^ 

i i 


we have the desired result. 

(b) Suppose that u has possible values U^, ■ • ■, Uy, v has possible 
values Fi, • • •, Vy, and that 

Pr(u = I/,., V = V,) = 0 if i # y 

Pr(u =Ui,v= K,) = R,. / = 1, • ■ •, V 

Then, since 


it follows that 
and similarly 
Hence, 


and 


Pr(u =Ui) = 2 Pr(u = Ut, v = F,) 

3 

Pr(,u = U,) = P, 

Pr(v=Vt)=^P^ ! = 1,---, V 
Eu=-2PiU, 

i 

i 

Ev = |r,F, 

% 

•^l^iPiiV.-Evf 

i 

N 

Ouv = ^Pi(Ui - EuXVi - Ev) - 

i 


Exercises 

4.1. w has possible values 1 and 0 with probabilities P and Q, then show 
that (Tu — PQ- 



54 EXPECTATION AND CONVERGENCE Ch. 3 

4.2. If u has possible values t/^, • • and v has possible values Vi, • • *, 

if Pr{u = Ui,v =- Kj) = 0 if 7 A y, and if the possible values of u are equally 

probable, then show that 

= V) 

4.3. If M has possible values • • •, U where 


and if 


j 

Pr(u = = 5 = 1, • • •, A 


where 

then show that 


A = Vi + « • • + Am 
M Af. - 


where 



Theorem 8. If u and v are independent^ then Euv = EuEv, AlsOy if 
Mi, • • •, Mfc are independent^ then Eupu^. ' * ’ “ EuiEu^ * ' • Euj^- 

Proof. By definition 

^ M 

Euv = 2 3 

i 3 

If u and V are independent, then 

p .. =z P .P . 

where 

P,, = Prill = U,), P., = Pr{v = F,) 


Substituting, we see that 

® N M 

Euv = 2P,.U, IP,V, 

i 3 

= EuEv 

To evaluate E^i * • ' u^ we note that, if Mi, • • *, Uj^ are independent, then 
the product Mi • • • is independent of m^. Hence, by the first part 
of the theorem we have Eu^ • • • m^. = E(Mi • * * • Em^. Continuing 

this procedure, we finally prove the theorem in k — \ steps. 

Theorem 9. If u and v are independent^ then = 0. 

Proof From Theorem 7 with c = = 0, we have = Euv — EuEv. 

Then the theorem follows from Theorem 8 . 


Sec. 4 VARIANCES AND COVARIANCES 55 

It is possible to have ~ 0 without u and v being independent. For 
example, let the possible values of w be i 1, i 2, each having probability 
and let v -■ u^. Then 


and 

whereas 
and hence 


Pr(u + l^V=l)= Pr(u - - 1, t; 1) =: J 

Pr{u == + 2,= 4) Pr{u = - 2,4) = J 

Pr(u = 1,= 4) = 0 

Pr(u = 1) = J and Pr(v = 4) J 

Pr{u = l,v = 4)^ Pr(u == l)Pr(v =- 4) 


Thus, u and v are dependent. Yet Eu == 0 and Euv = 0 so that = 0. 
If ^uv — we shall say that u and v have zero covariance. 

Let us define the correlation coefficient by the equation 
Puv ~ ^uvt^u^v' Then, if = 0, we have == 0, and say that u and v 
are uncorrelated. 

Theorem 10. The correlation coefficient p„^ is such that — 1 ^ p„^, < 1 
and p„^ — i 1 and only if u — c -{- dv, where c and d are constants, 
except for values of u and v having zero probability. 

Proof Since 


T u~ Eu 

\ cr.. cr„ / 


>0 


it follows that, if we expand the square, then, from Theorem 6 and the 
definition of variance and covariance, we have 


i.e., 

Similarly, since 

it follows that 


1 ~ '^Puv + 1 > 0 
1 -/>«.> 0 

^ ^ Puv 


(4.9) 


( u — 
cr, 


Eu v—Ev\^ 

— +- >0 

. cr. / 


1 4- Ptt^; > 0 

so that - 1 < P„^. 

If = L then the left member of the Inequality 4.9 is zero, and since 
u — Eu V — 


( u— Eu V— Ev V 
\ O'. / ~ 



Ch. 3 


56 EXPECTATION AND CONVERGENCE 

only if the quantity in parentheses is zero, excluding possible values that 
have zero probability, we have 

u — Eu V ~ Ev 

or 

u = Eu--Ev + -v 

Hence, the third part of Theorem 10 is proved, if = 1, with 
c = Eu-i(rjG^)Ev and d=aja^. A similar result is obtained if 

Puv 

Remark. Theorem 10 is a special case of the more general theorem 
Euv < (Eu^Ev'^yii for any random variables u and v having finite second 
moments. 

Just as the theorem on the expected value of a linear combination of 
random variables plays a fundamental part in sampling, so does the 
theorem on the variance of a linear combination of random variables. 

k 

Theorem 11. Ifu = %CiUi, where are random variables and 

i 

Cl, * * *, Cfc are constants, then 

k k 
i 3 

where is the covariance of Ui and Uj. 

Proof. Since u-Eu = 2c,{«. - £■%). ‘t follows upon squaring both 

% 

sides of the equation (see Illustration 2.2 of Ch. 2) that 

k k 

E(u - Euf = -£'22 CiC,{Ui - -£«()(«,• - Eu,) 

i 3 
k k 

= 2 ^CiCjE{Ui - EU(){ui - EUj) 

i 3 

by Theorem 6. Then, the theorem follows from the definition of the 
covariance. 

The following four corollaries are obvious conclusions from Theorem 
11. Their proofs are left as exercises. 

Corollary 1. If the are uncorrelated, then 

k 

i 



Sec. 4 

Corollary 2. 

then 


Corollary 3. 


then 


VARIANCES AND COVARIANCES 57 

If the Ui are uncorrelated and have equal variances, 

i 

If the w, are uncorrelated and have equal variances, a\ and 

Q = + • • • + 



Also, if w = + . • • + then 

al = ko^ 

Corollary 4. The variance of a constant times a random variable is 
the constant squared times the variance of the random variable, i.e., 

o'cl = 

Blustratlon 4.2. In stratified sampling the population is first divided 
into L classes or, as they are usually called, strata, such that each element 
of the population is in one and only one stratum. Then, samples are 
independently selected from each stratum. If w = qwi + • • • + 
where w,. is based on the sample from the ith stratum, then by Theorem 6 
and Corollary 1 to Theorem 9, we have 

Eu = ^CiEUi 

and » 

< - 

i 

where o-f is the variance of w,-. Hence, the knowledge of the expected 
value and variance of each of the w,. permits us to evaluate the expected 
value and variance of u when stratified sampling is used. The following 
theorem is sometimes useful. 


k m 

Theorem 12. If u = and v = then 

^ 1. ^ 

Te m 

t i 

Proof. The proof almost repeats that of Theorem 11. It is obtained 
by using Theorem 6 and the fact that, by Illustration 2.2 of Ch. 2, we have 

I m 

(u - Eu)iv - £v) = 2 - Eu,){Vi - Ev,) 

i 3 




; EXPECTATION AND CONVERGENCE C 

Corollary. If and i ^ y, are uncorrelated, and = m, then 


^UV 


and if c,- — d. — 1/m, then 


m^ i 

The proof is omitted. 

niustration 4.3. (a) If al. = / — 1, * • •, «, and a^.u^ — ^ 

/, / = 1, • • •, then 

o <^2 «-l 

0-1 =-^ - c 

n n 

where u = ^ujn. To see this, we turn to Theorem 11. To reduce that 

i 0 • 

theorem to this special case, we first put /: = «, c, = l/«, thus obtaimng 

" i j 

We have already noted that . Also, it is easy to verify that the 

double summation above contains terms of which exactly n are of the 
form so that the remaining n^ — n^n{n — 1) terms are of the 

form i ^j. Hence 

I w \ n 

= 3 2 O’u.u, + 3 2. 

fl i 1^3 

where the first summation has n terms and the second summation has 
«(« — 1) terms. Therefore 

„ « „ n(n — 1) n — I 

n*? = — (T^ -4“ - C — ~f“ C 

^ ^ n n 

(b) Let Ui have two possible values, 1 and 0, and let Pr(Ui = 1) = 
Pr{u^ = 0) — — 1 — Pii / = 1, ' * Let u — Then, if the 

are uncorrelated, it follows from Corollary 1 to Theorem 11 that 

n 

i 

In this special case 

al = £(«i - £:»,)= = (1 - PifPi + (0 - PrfQi = Q\Pi + A-e. 

= + 2;) = A2i 



Sec. 5 
Hence 


CONDITIONAL EXPECTATION 


59 


< = lc?P,Q, 


Further illustrations of the uses of Theorems 7-12 will be found in 
Chapters 4 and 5. 

Exercises 


4.4. Consider 


+ ^2 d- • • * + 
M 


where Wg, * • *, are random variables. Show that 

1 M 

2 NT 

Now assume that Ui takes on the values 11^, Ui^, • * •, t/^^y with equal proba¬ 
bility. Show that Eu = U, where 

1 M N 

'’-ssll"- 

Assume, also, that 

Friui = Uia, Uj = Ujfi) = 0 a # /5 =1,2,- 


N 


Then 


where 


2(Lioc- L,)(Lioc- Ui) 


Ou.u, = - 

N 


N 


2 


and Uj == - - 

* N 

^ N 


Evaluate 

4.5. Let u == 3% + where gI, = 2, gI, = .25, and Guau = .6. Evaluate 

ol 

4.6. Show that 

^ ^UV ^ 

for all random variables u and v having finite variances. 

4.7. Show that 

0 < E\uv\ < {Eu^fl^{Ev^fli 

for all random variables. 


5. Conditional expectation. The following definitions and theorems 
are important in evaluating expected values associated with multi-stage 
sampling. 

Let Pi(B*) be the conditional probability that the random variable u 
takes on the value subject to the occurrence of the event 5*, i.e. 

= Priu U,\B*) 




60 EXPECTATION AND CONVERGENCE Ch. 3 

Then, by definition, the conditional expectation of the random variable u, 
given B*, is n 

E{u\B*) = J.PAB*)U, 

i 

In other words, to calculate the conditional expectation we apply the 
same procedure as that for the expected value except that we use con¬ 
ditional probabilities. 

niustration 5.1. Let u and v be two random variables with 
Pr(u - C/„ = F,) - P,,, ‘ N, — % M. Compute 

E{u\v = Fj). Since v = if any of the elementary events u = C/„ 
= F., / = 1, • • •, N, should occur, it follows from the definition of the 

N 

probability of an event that Pr(v = Vj) == 
definition of conditional probability, ^ 


Pr(u = U,\v = F,) = 


Pr(u = Uj, V == Fj) 
Pr{v = F,) 



Hence, defining 5* to be the event v = Vp we have 
E{u\v = Vj) = 

i ^ i 

Illustration 5.2. {a) Let w be a random variable with possible values 
C/i, • • *, U^r. Then E{u\u = U^) = U^. 

Proof. Since 

Pr{u = Ui, M = l/i) = Pr{u = C/i) if / = 1 


it follows that 

Pt.{u^U,\u== Cl) - 1 if/ - 1 

= 0 otherwise 


When we substitute in the definition of conditional expectation, the result 
follows. 

{b) Let u and v be random variables with possible values Ci, • • •, 
t^iv. • • •. Vm- Then E{uv\u = = UiE(v\u = U^). If ( = 1, 

Priu =Ui,v= V,) 

Pr{u = t/j) 

= pr{v = V,\u = Cl) 


Prill = C„ ?; = V,\u = Cl) 


Sec. 5 CONDITIONAL EXPECTATION 61 

Ifl^l, 

Pr{u = t/„ V = Vf, « = t/j) = 0 

and hence 

Pr{u = Ui,v = Vj\u --= Cl) = 0 
Hence, by the definition of conditional expectation, we have 

M 

E(uv\u = Uy) = %UyVyPr{V = Vy\u = Uy) 

J 

— Ui£(vlu ~ 

It is clear that a theorem comparable to Theorem 6 can be stated in 
terms of conditional expectations as 
Theorem 13. The conditional expected value of a linear combination of 
random variables is the same linear combination of the conditional expected 
values of these random variables, the same condition being used throughout. 
In symbols, if * *, are random variables, and u — + • * * + cp,i^, 

then ^ 

E(u\B*) = icyE(uy\B*) 

i 

The proof directly parallels that given for Theorem 6. It is left as an 
exercise. 

Exercise 5.1. Prove theorems comparable to Theorems 3, 4, and 5 stated in 
terms of conditional expectations. 

Sometimes, the event is one of the states of a random event 
For example, in Illustration 5.1 the event v = is one of the states of 
the random event '"v takes on one of its possible values.” In such a case 
E{v\B*) is one of the values of a random variable. If the random event 
b^ has possible states B^, • ' B% with probabilities P^, ‘ » •, we 
define E{u\b^) to be the random variable which has possible values 
E{u\Bf),j = 1, • • •, M, with probabilities P^, • • % P^. Then, we have 
the basic computing theorem: 

Theorem 14. The expected value of a random variable is equal to the 
expected value of the conditional expectation of that random variable, the 
condition being a random event. In symbols, if u is a random variable and 
b^ is a random event, then 

M 

Eu = E[E{u\b^)] = 'lP,E{u\Bf) 
where P^ — Pr{b'^ — Bf). 

Proof, By the definition of conditional expectation 
E(u\B^) = iuyPlB^) 

i 





Ch. 3 


62 EXPECTATION AND CONVERGENCE 

and by the definition of conditional probability 


pm) 


Pr(u ^ U„ b* = B*) 


SO that 


M M N 

j j ^ 

= luA2Pr{u-=U„b*=^Bf) 

i I j 

But the event will occur if and only if one of the elementary 

events « = U,, b* - BfJ - 1, • • % M should occur. Hence by Ch. 2, 
TViftnrem 5 (n. 281 we have 


SO that 


Pr(u =Ud = l PKu = U„ b* = Bp 

J 

M ^ 

2PjE(u\Bp = 2UiPr(u = Vi) = Eu 


J 

Corollary 1. Let u and v be random variables, and let E{u\v) denote 
the random variable having possible values E{u\v = V^) with probabilities 
Pr{p = = 1, • • •, M, where • • •, Vm are the possible values of 

V, Then 

Eu = E{Eiu\v)\ 


The proof 

Corollary 


consists in using Theorem 14, the possible state Bf being 

1, • • *, M. 

2. With the same conditions as those given in Corollary 1, 


Euv = E{vE{u\v)'\ 


Proof. Since, as is shown in detail in Illustration 

Eiuv\v = F,) = V^E{u\v = F,.) ; = 1, • • M 


we have 


E{uv\v) — vE{u\v) 


Then, by Theorem 14 

Euv = E{E{uv\v)] = E{vE{u\v)) 

Illustration 5.3. Suppose that a population consists of 3 blocks on 
which 5 6, and 9 families live. A block is selected with probability 
proportionate to size, and from the selected block 1 family is selected by 
an epsem. Let u be the number of persons in the selected family. Com¬ 
pute Eu. Let h* be the random event having 3 states 5*, and fig, 
where, if fif occurs, the zth block is selected. Then 

= fi*) = A, Pr{b^--Bt) ^ and = fij) = A 




Sec. 6 CONDITIONAL VARIANCE AND COVARIANCE 63 


since the block is being selected with probability proportionate to size. 
Now, 


0 


£i 

M, 


where is the number of persons in the yth family of the rth block, 
Mi is the number of families in the ith block, and f/,- is the number of 
persons in the /th block. Therefore, by Theorem 14, we have 

Eu == ipr(b* = Bt)E(u\Bt) 

i 

3 M- U. 

~ ^ M Mi~ M 

where 

f = Bt) 


and Eu is the average number of persons per family. 

Further illustrations of the uses of Theorems 13 and 14 may be found 
in Chapters 4, 6, 7, and 9. 

Exercises 

5 2. Let V have possible values 0, 1, 2 with probabilities and let 

E{u\v = 0 - c,, where c^ = 3,c^ = ~ 1, and Cg = 7. Then evaluate Eu. 

5.3. Let u have possible values t/j, • • •, and v have possible values 
Fj, • • *, and let = Pr{u = C/,-, v = F,). If C/, - /, N, 

and Vj — j, j ~ ■ • • , M, evaluate: 

E{u\v < 3), E{uv\u V < 4), E{u^ ~ v^\u = 1) 


6. Conditional variance and covariance. Just as conditional expectation 
is useful in multi-stage and other sampling problems, so are the conditional 
variance and covariance. 

Let 2 / be a random variable and let B* be an event. Then the con¬ 
ditional variance of u, given B^, is defined by the equation 

<yllB^=^El{^-E(u\B'^)Y\B^] (6.1) 

Thus to compute the conditional variance we first compute the conditional 
expectation and then compute the conditional expected value of the 
square of the deviation between the random variable and its conditional 
expectation. 

The conditional covariance of u and v, given 5^, is similarly defined: 

^uv\B* E[{u — E{u\B*)}{v — E(v\B*)}\B*] (6.2) 






64 EXPECTATION AND CONVERGENCE Ch. 3 

niustration 6.1. {a) Let u and v be random variables with possible 
values, £ 4 , • ■ and • • *, V and let = Pr{u — = Vj). 

Then, if Pr(v = F,) > 0, it follows that 

Pr(,u = U,\v = F,) = ^ 

r.j 


where 

fetW- 

11 

and 

E(u\v = Vf) = '2Ui-^ 

Also, using Eq. 6.1 with B* being the event v = F„ we have 


i ^-3 

{b) If P* is 

the event v == F^ or F 2 or • * • or {m < Af), then 


Pr{u = U,\B^) = Ip, 3 

3 

and 

11 

so that 



iPii 

Pr{u = U,\B*) - 

IIP a 

i j 

Then 

E(ulB*) = fu,Pr(u = a,|B*) 

i 

and 



^ iPii 

’ llPii 

i 3 

(c) Let P* be defined as in Illustration 6.1© above. Then 


fi = 1, • • •, iV 

Pr{u F,,,5*) = Pi! ^ 


(;•= 1,- • ■,N 

\y =,m + 1, • • 




65 


Sec. 6 CONDITIONAL VARIANCE AND COVARIANCE 

SO that 


and 


Hence 

and 


Pr{u = = V,\B^) = 


(; 


1, 


Pr{B*) 

= 0 otherwise 


■,N 


Pr(v = K,|B*) = f —^ 


J= I, - ■ -.m 


— 0 otherwise 


m 

E(v\B*) = 2VfPr(v = K,|S*) 

3 

N m p 

<yuv\B* = 22 [U,-E{u\B*)][Vi-E{v\B*)] 


If b* is a random event having states B*, • • •, B\ with probabilities 
■ ■ ■> then by ctJk,* we mean the random variable having possible 
values with probabilities Pj, • ■ •, Pj^. Similarly cr„„|,* 

IS a random vanable having possible values o-„|b*, • • •, cr„|B* with prob- 
abilities • • •, Pj^, Then, from the definition of expeefed value, it 
follows that 

M 

j 

M 

3 

Ulustration 6.2. In Illustration 6.1(a) let be the random event 
having possible states 5* • • •, R*, where Bj is the event v = V^, and let 

Pr(b* = Bf) = Pr{v = F,) = |p,. = p.. 


Then tr^u* is the random variable having possible values 

and ^ 

= ffJis*) = P.j y = 1, • • •, Af 

Also 

= 2E 

3 

where was evaluated in Illustration 6.1(a). 

Theorem 15. Ifuandv are random variables, and b'^ is a random event 
then 




(6.3) 





66 

and 


EXPECTATION AND CONVERGENCE 


Ch. 3 


^^uv\b-^ + ^E(u\b^)Eiv\b*) 

Proof. Since Eq. 6.3 is a special case of Eq. 6.4, we shall prove only 
Eq. 6.4. Since u — Eu — u — E(u\b*) + E(u\b'^) — Eu and v— Ev 
= v-~E{v\b^) + E{v\b^) — Ev, we have 

(T„„ = E[u — E{u\b*)][v — E{v\b*)] + E[u — E{u\b*)][E(v\b*) — Ev] 

+ E\E(u\b*) - Eu][v - E(v\b*)] + E[E{u\b*) - Eu]{E{v\b*) - Ev] 

By Theorem 14 we have 

E[E{u\b^)\ = Eu 
E[uE(v\b*)] = E[E{u\b^)E{v\b^)] 

so that 

E[u - E{u\b^)\[E{v\b^) - Ev\ 

- E[uE{v\b^) - uEv - E{u\b*)E{v\b*) + E{u\b*)Ev] 

= 0 


Similarly 

Now 

Finally 


E[E(u\b^) - Eu]{v - E{v\b^)] = 0 
E[u-E{u\b*)][v - E{v\b*)] = 
E[E{u\b*)— Eu][E(v\b*)— Ev] = aEiuih*)BMb*) 


by the definition of covariance. Hence 


^uv = ^^uv\b^ + ^E(u\b*)Eiv\b*) 


as was to be proved. 


Exercises 

6.1. Suppose that a random variable «] has possible values Ui, t/g,. . C/^y, 

with equal probability, and that if assumes the value t/^, then the random 

variable Wg takes on the possible values (4, C/g,' * •, except with equal 

N _ 

probability. Show that al, = cr^ where 
Hint: 

gI, = EgI,\u, + 

N 

lu, 

Eui = — = U 
' N 


67 


SeCo 6 CONDITIONAL VARIANCE AND COVARIANCE 

N 

E(u,lu, = U,) = 

A"- 1 1 


where 


and, similarly, 




(N- If 


u. = 


u, 

N- 1 


1 Ml — 


N(N~ 2) 
(V- 1)2 


0-2 


6.2. Give the details of the proof that 

— 0 


^uv\v — 0 

For some illustrations of the application of the above theorems in the 
evaluation of expected values and variances of estimates for some standard 
survey designs, see Sec. 2, Ch. 4; Sec. 1, Ch. 5; and Sec. 1, Ch. 6. 

Often, as in sampling designs involving three or more stages, we need 
to consider the following extension of Theorem 15. 

Let b^, ■ ■ b% be random events, and denote by 


£(«!/!) 

th^ expected value of u, given the results, of the first j random events, 
7 — L * * *, 

Define 


so that 


*,■ = £(«!/!) 


(6.5) 


By 


*s: = E{u\K^:) = u 
<"", 10 - 1)1 j=U - ■ -yK 


is meant the variance of z, holding only the first /- 1 random events 
fixed, so that 


Finally, by 




^2iiO! ■— ^E{u\l\) — 




is meant the expected value of over all possible states of the first 

J 1 random events. Then we have the following theorem: 




68 EXPECTATION AND CONVERGENCE Ch. 3 

Theorem 16. If u is a random variable, then 

+ • • • + E<jX\n + of, 

where is defined by Eq. 6.5. 

The proof is an immediate extension of Theorem 15. 

Corollary. In applying Theorem 16 to a i^-stage sampling design, u is 
the sample estimate and the yth random event is ‘‘selection ofyth-stage 
sampling units,” ; = 1, 2, • • K. Thus, is the conditional expected 
value of w, holding the results of the first j stages of sampling fixed; 

is the conditional variance of z^, holding the results of the first 
j — \ stages fixed; and Ea1.\^Q_iy is the expected value of this conditional 
variance over all possible results of the first j — 1 stages of sampling, 
j ^12, • • K. Note that Zj^ = u, since the conditional expectation 
of u for all K stages fixed is u itself. 

The quantity 

ioxj — 1 , 2 ,* * ',K 

is referred to as the contribution to the variance of the /th stage of 
sampling. 

In multi-stage sampling we use the following notation: 

z, = E{u\[\,2r • •,/]) j=h2r -.K 

• • sj-i] y = 1, 2, • • %K 
Thus, for a three-stage design we write 

2] + 2])i[i] + ^m\[i]) 

A result similar to Theorem 16 can be obtained for using the 
following notation. 

The quantities E(v\j\), Wj — £’(?;|y!), 

are defined as were similar expressions in u and then’s; for 
example, is the covariance of z^ and w,-, holding the first; — 1 

random events fixed. 

Theorem 17, If u and v are random variables, then 
The proof is omitted. 

Corollary. A result similar to the corollary to Theorem 16 can be 
obtained for by applying Theorem 17 to a X-stage sampling design. 

In the application of the theorems above, if one wishes to determine the 
contributions from all stages of sampUng it may sometimes be convenient 
to apply Theorem 15 in successive applications. When one wants to 




Sec. 7 


69 


THE TCHEBYCHEFF INEQUALITY 

determine the contribution to the variance or covariance from a particular 
stage of sampling, then the corollary to Theorem 16 or 17 is appropriate. 
These corollaries were used in developing the variance for a three-stage 
sampling design in Sec. 4 of Ch. 7. The derivation of the variance for a 
multi-stage sampling design by a successive application of Theorem 15 is 
given in Chapter 9. The latter use makes it possible to indicate, for 
example, the contribution to the variance from the first stage of sampling 
and the combined contribution from all subsequent stages. 

7. The Tchebycheff inequality. Convergence in probability, Consistency. 

In the preceding sections, we have developed the tools by which we 
compute expected values and variances of survey designs. In order to 
utilize Aese quantities, however, it must be possible to evaluate at least 
approximately the probabilities of specified differences between the 
estimates and the quantities we wish to estimate. It is to questions 
arising from this need that we turn in the present section. 

The Tchebycheff and Markov inequalities. One of the more remark¬ 
able theorems of the theory of probability is that due to Tchebycheff by 
which bounds are derived for the probabilities of the difference between 
any random variable and a preassigned value, the latter usually being 
taken to be the expected value of the random variable. The Markov 
inequality is a generalization of the Tchebycheff inequality. 

Theorem 18 (the Tchebycheff inequality). Let u be a random variable 
and let c be a constant. Then, for any £ > 0, 

/’/•(!«-c| > ) 

Pr{\u-c\ ^ £) > 1 - I ^ 

g2 

SO that if c Eu the inequalities become 

Pr{\u- Eu\ > £) < ^ 

Pr{\u-- Eu\r^e)>\-^ 

Proof. Suppose that u has possible values U^, ■ • with prob¬ 

abilities Fi, • • •, F^. Suppose that of the N differences C4 — c, U^ — c, 

■ ■ •, U^~c, exactly w of them, say c, • • •, c, have their 

absolute values > £, while the other N— m have absolute values < e 
Then 

Fr(|«-c|>£) = Fi-l-- • --fF^ 



70 

By definition 


EXPECTATION AND CONVERGENCE 


Ch. 3 


E(u-cf = lPlV,-cy 






since, if 1 — c| > e, then (11^ — cf > . Hence 


E(u-c)‘ 


>1P. 


Now, by Theorem 7 of Sec. 4 (p. 51) of this chapter, 

E{u — = crfi + {Eu — cY 

which completes the proof of the first part of the theorem. The second 
part follows from the fact that 

Pr(\u— c| > £) + Pri\u— c\ < £) = 1 

The importance of the Tchebycheff inequality arises from its generality 
and its usefulness in proving limiting results. The bound set by the 
inequality is valid for any random variable. However, if something more 
than the theorem requires is known about the random variable, then 
closer bounds can usually be set. 

Some general conclusions may be given at once. 

Corollary 1 (the Markov inequality). Let w be a random variable. 

Then, for any e > 0, 

, . E\u—Eu\^ 

Pr(\u- Eu\ > e)< - 


Pr(\u— Eu\ < e) > 1 


E\u-Eu\^ 


if A: > 1. 

Proof. Since, by the definition of expected value, 

E\u- Eu\’= = J,Pi\Ui- Eu\’‘ 

i 

the steps in proving Corollary 1 follow exactly those of Theorem 18 and 
are left as an exercise. 


Sec. 7 the TCHEBYCHEFF INEQUALITY 71 

Corollary 2 . Let be uncorrelated and have the same 

expected value U and same variance o-^ Then, for any e > 0, 

limFr(|«- > «) = o 

n~>co 

Proof. From Corollary 3 to Theorem 11, we have 


n 

Using the Tchebycheff inequality and letting « oo, the result is 
immediate. 

We can easily generalize Corollary 2 to the case where the random 
vanables are correlated. 

be the arithmetic mean of V„ U„ • • •, Then, for any « > 0, we have 


and hence if 


it follows that 


Pr{\u~ > e) <ii 


n 

ij 


lim'^ 


0 


n->00 

lim Pr(\u-~U\ > e) = 0 


r^xercises 


ult V‘l"i,nTi ®qaally probable, 

uf - 'br compare the probabilities of the inequality \u~ Eu\ '> f 

obtained from the Tchebycheff inequality with the true probabiMes. 

7.2. By setting £ — 3c; in Eq. 7.1, show that no more than one-ninth of the 

timefth/*’r H average of all results by more than three 

times the standard deviation, no matter what the population. 

Remark 1. With the aid of the Tchebycheff inequality it is possible to 
show in what sense the relative frequencies of occurreni of the possible 
values of a random variable tend to the probabilities of those possible values. 

Let Hi • • •, H„ be H independent random variables each having possible 
values [fi, • • [/ and probabilities P^, ■ ■ ■, p^,. Then, we say that 

Up , Un are n independent random variables having the same distribu¬ 
tions, i.e^ same possible values and probabilities. Let be a random 
valuable having two possible values 1 and 0. Let = 1 if and only if 
u, - U, and let t>,., = 0 otherwise. Then, v^,. . ., are independent and 

PriVt! = 1) = Fr(H, = U,.) = F, r i = 1, • • •, AT 


Pr(v„ .= 0) = Fr(Hi ^ = 1 - P, = 1, . . ., „ 



72 


EXPECTATION AND CONVERGENCE 


Ch. 3 



then the possible values of are the relative frequencies with which Ui 
may occur in performing the n operations. Since 

EVij = IP, + 0(1 - P,) = Pi 


it follows from Theorem 6 that Ev, 


Since crL 


P,) for all 


y, and since ■ 
Theorem 11 that 


Vi„ are independent, it follows from Corollary 3 to 


PiU - P,) 


Then, from Theorem 18, we have 


Pr(\Vi„-Pi\ > e) < 


Pi(\ - P.) 


so that 


lim Pr(lvi„-Pil>^) = 0 


i e., the probability that the relative frequency of t/j differs from its proba¬ 
bility by more than any positive quantity, however small, will tend to zero 


as n increases. 


b. Convergence in probability. Consistency. We have already defined 
biased and unbiased estimates in Sec. 2 of this chapter. We shall now 
define consistency, which is often a more useful property than unbiased¬ 
ness. To do this we shall first define convergence in probability, trom 
which the definition of consistency follows almost immediately. 

Definition of convergence in probability. We call a set of random 
variables tij, • that are identified by the positive integers a sequence 

of random variables; i.e., ^ven the positive integer i, there is exactly one 
random variable Vf that is identified, i = 1, 2, • • • . 

A sequence of random variables ^i, v^, • • • is said to converge m 
probability to a random variable or constant v if, for every e > 0, 


lim Pr(|n„ - «| > e) = 0 

>co 

or, equivalently, 

lim PKl^n — v\^e)— \ 


Definition of consistency. A sequence of estimates «i, u^, ■ ■ • is said 
to be a consistent estimate of U if the sequence «i, u^, ■ • ■ converges in 
probabiUty to U. Thus, if «i, « 2 , • • • are uncorrelated random variables 


Sec. 7 


THE TCHEBYCHEFF INEQUALITY 73 

having a common mean U and common variance and if * * * 
is the sequence of arithmetic means, where 


+ 


« = 1 , 2 , 


then we have proved in Corollary 2 to Theorem 18 that the sequence of 
arithmetic means converges in probability to the expected value U. 
Therefore is a consistent estimate of JJ, Also, we have shown in 
Remark 1 (p. 71) that the relative frequencies with which each of the 
possible results occurs when the same operation is repeated will converge 
in probability to the probability of the occurrence of that possible result 
as the number of repetitions increases. Thus, the sequence of relative 
frequencies * ’ * defined in the Remark is a consistent estimate of 

P„ / - 1, 2, • • *, A. 

In general, the expected value of the limiting value of a sequence oT 
random variables is equal to the limiting value of the sequence of the 
expected values of the random variables. There are some minor excep¬ 
tions, such as in Illustration 7.1 below, but these exceptions cannot occur 
if all the variables are constrained to be less than some common upper 
bound. 

Illustration 7.1. Convergence in probability may occur without the 
expected values of the sequence of random variables tending to the 
expected value of the limiting random variable. For example, let 
have two possible values 0 and n, and let 

Pr{u„ = 0) = ~ 

n 

Pr(u„ =n) = - 
n 

Then the sequence of random variables W 2 , • • • converges in prob¬ 
ability to 0 , since, for any e > 0 , 

lim Pr{\u^-- 0| > e) = 0 

n~^oo 

However, 

for all n so that 

lim Eu^ 7 ^ E (plim u^) 

w-xx) w—>co 

where “plim” stands for “limit in the sense of probability.” 








74 EXPECTATION AND CONVERGENCE Ch. 3 

The definition of consistency given above is useful when the sampling 
is done with replacement or the population is infinite. However, a great 
deal of its usefulness is lost when sampling from a finite population. 
For example, suppose that a population consists of 2000 elementary units, 
classified into two groups each consisting of 1000 elementary units, all 
units in the first group being Fs and all units in the second group being 
O’s. Suppose, also, that the sample is obtained by selecting elements 
from group 1 until the number of elementary units in the first group is 
exhausted and then selecting the remaining elements for the sample from 
the second group. Now suppose that the estimate of the proportion 
of Fs in the population is defined to be the proportion of Fs in the sample 
of size n. Then = 1, n = 1, • * % 1000, and = 1000/n, n — 1001, 

• ♦ 2000 . If the sample were selected with replacement, would not 

be a consistent estimate of the proportion of Fs in the population, since 
the estimate would always be 1 . 

To avoid this type of contradiction, we shall make the following assump¬ 
tions for any finite population. We shall require that to meet the defini¬ 
tion of consistency the following two conditions be satisfied. 

( 1 ) As the size of sample n increases, the size of population N will also 

increase, and for all n and N we will have n <C tN, where 0 <C / 1 • 

(2) As the size of population increases, the quantity U that we want to 
estimate will remain constant. 

If these two assumptions are made, then the sequence of estimates 
discussed in the example above would be inconsistent. 

When ‘fiim” is written, we mean the limit as n becomes infinite, subject 
to the above two conditions if we are sampling from a finite population. 

We give two simple theorems that enable us to prove consistency in 
many cases. 

Theorem 19. Let Eu^ = h, and let = al Then a sufficient con¬ 
dition that the sequence u^, ^ — be a consistent estimate of U is that 
both of the following conditions are satisfied: 


(1) lim|h,-C/H0 

( 2 ) lim = 0 


Proof By Tchebycheff’s inequality (Theorem 18), we show that 


Pr{W- V\>e)< 


+ Q 


It follows that if (1) and (2) hold, then the sequence 1 / 3 , • • • is a 
consistent estimate of the sequence U. 




Sec. 7 THE TCHEBYCHEFF INEQUALITY 75 

CoroUary. If is an unbiased estimate of U, n = 1 , 2 , • • % and if 
lim a\ — 0 , then Wg, • * • is a consistent estimate of U. 

Proof. The proof consists of noting that U in Theorem 19. 

The following theorem states that any continuous function of consistent 
estimates is itself consistent and the value to which it tends is the same 
function of the limiting values of the consistent estimates. 

Theorem 20. Let the sequence be a consistent estimate of (7„ / = 1, 
• • and let /(t^, • • % 4 ) be a continuous function of • • •, 4 . Then 
the sequence fiu^^, • • •, is a consistent estimate of f (Ui, • • % U^.). 

Proof Since f{t^, • • % 4 ) is continuous, it follows that, given any 

> 0, we can find ^ > 0 such that, if |mi,, - Ui| < (3, * • •, \uj,^ - C/^| < a, 
then |/(wi„, • * •, • • •, 17^)1 < e. Hence 

Pr[\fiu^n. • • *, u^n)~fiU^. • • % U,)| < e] 

> Pr{\u^n ' •, Kn - Ufcl < S) 

Now 

Pr(\ ^in — ^i\ < * •, l^kn “ ^kl < <^) + ^^at least one of 

the inequalities — f/J 1 

since one of the two events in parentheses is certain to occur. Also 

Fr(at least one of the inequalities \u^n~ ^i\ ^ 

<lPr(\u,,-U,\>d) 

since if the event in the parenthesis on the left occurs, one or more of the 
events on the right will occur. But, since the sequence u^^ — converges 
in probability to zero, / = 1 , • • it then follows that the right side 
tends to zero, and hence 

limFr(at least one of the k inequalities — U, | > < 5 ) = 0 
so that for any £ > 0 

lim />r[|/(«!„, • • •, u^„) ■ • •, U^)\ < e] = 1 

which completes the proof. 

Corollary 1. If/( 4 , ♦ • *, 4) is a polynomial in 4, • • •, 4, then 
/(«!„, • • *, is a consistent estimate of /(U^, • • ♦, C/^) if is a 
consistent estimate of U^, / = 1 , • • 

Corollary 2. A rational function of consistent estimates is a consistent 
estimate of the same rational function of the quantities being estimated 
provided the denominator does not vanish. More formally, if 
fihy • * % 4 ) is a rational function of 4, * • ♦, 4, then /(wi„, • * % 
is a consistent estimate of /(Uj, • • •, for all possible values of 



76 EXPECTATION AND CONVERGENCE Ch. 3 

Ui, • * % except those for which the denominator of/(^i, • • *, 4) 
vanishes provided that is a consistent estimate of / — 1, * * *, /c. 

Proof. A rational function is continuous for all finite values of its 
variables except those for which its denominator vanishes. Hence, if we 
exclude the values for which the denominator vanishes, we will be limiting 
ourselves to values for which/(b, • • *, 4) is continuous, and Theorem 20 
applies. 

It is easy to show that an estimate may be consistent and yet have 
infinite variance as n increases. If we consider the sequence of random 
variables discussed in Remark 1 of this section, then we have already 
shown that ' * * is a consistent estimate of 0, since the sequence 

converges in probability to 0. We have also shown that Eu^ — 1. Now 



so that al^-=n —I and hence al^ becomes larger and larger as n increases, 
even though • converges in probability to zero. 

Remark 2. Satisfactory discussions of the normal limiting distribution 
may be found in almost any standard text on probability theory. It has 
been shown [see, for example, W. G. Madow, “On the Limiting Distribu¬ 
tions of Estimates Based on Samples from Finite Universes,” Annals Math. 
Stat., 19 (1948), 535-545] that these limiting distributions also are valid 
for samples from finite populations. 



APPENDIX A 


Sums of Powers 


By the Xth sum of powers about a is meant 

M 

Ordinarily we shall be concerned with sums of powers about 0 (a = 0 ) 
and sums of powers about x {ax, the arithmetic mean o(x^, • • •, Xj^), 
Sums of powers about 0 will be denoted by S*, • • •, and sums of 
powers about x will be denoted by • • •. 

In order to evaluate the S^s in terms of the ;S*’s let us recall the binomial 
theorem for positive integral exponents. 

Binomial theorem. If K is any positive integer, then 


{a + b)^ = + Cfa^-^b + + . . . + 

= I Cfa^-V 


where 


t = 0 


(A.1) 




K\ 


' "" /!(F^ ^ 


and we define Cf ^ \. 

Proof If iT = 1 or a: = 2, the theorem may be proved by expanding 
the parenthesis. Suppose now that the theorem is true for K. Then we 
will prove it to be true for X + 1. Since it is true for A: = 1, this will 
then complete the proof for all positive integers. Since {a + = 

{a + b){a + b)^, and since Eq. A.l is assumed to hold, we have 

{a = (a + h) 2 

i = 0 

= I C,V-*+'6‘ + I Cfa^-V+^ 

^=0 ^=0 

- + (Cf + C^)a’^b + (Cf + + 

• ■ • + (C| + Cl_,)ab^ + 1 (A.2) 

77 




78 EXPECTATION AND CONVERGENCE Ch. 3 

Hence we want to evaluate Cf+i + Cf. Now by the definition we have 
^ K\ K\ 

Cf+i + Cf- = (,•_)_ 1)1 + i\ (K~ (•)! 

=_ B _ (-^ + ^] 

\y.\i + I K-i! 

_ {K+ 1)! _ K+i 

“(i+l)!(X-/)! 

Hence we have proved the important equation 

cf+f = CK 1 + Cf i = 0, • • •, X; X > 0 (A.3) 

Substituting in Eq. A.2, we have 

i?+ 1 

(a + 6)'^+^ = 2 Cf(A.4) 

i = 0 

which is the same formula as Eq. A.l but with K replaced by X+ 1. 

Hence, the theorem is true for X + 1 if it is true for X. (This is a proof 

by induction on K.) 

Let us now apply the binomial theorem and some previous results to 
obtain some relations between 8 and 8 '^. 

Theorem A.l. If we define ^0 = ^0= 

Sk = i Cf(- (A.5) 

S*K = I Cfx>S„_i (A.6) 

j = 0 

Proof. By definition Sj. = 1 {x, - xf, and by the binomial theorem 


(x,-xf=^lCf{-\yxixf^ 

i = o 

8„ = 22 cf(- xyxhf-^ 

i-1 i=0 


2 2 cf(- lyx’xf-’ (by Ex. 2.10, Ch. 2) 


3 = 0 1 = 1 


2 cf(— lyx’ 2 (t’y Eq- 2.1, ch. 2) 


Hence 



SUMS OF POWERS 


79 


App. A 

Hence Eq. A.5 is true. To prove Eq. A.6 i:eplace by x^ — x x in the 
definition of S% and apply the binomial theorem with x^ — x ■■= a, x — b. 
This is left as an exercise. 

Let us now write out Eq. A.5 for K ^ 1, 2, 3, 4. 

- Mr = 0 

S2 = S^~ 2xS^ + Mx^ = - Mx^ 

= 3xS-l + 3x^S* ~ M0 =:St- 3xS^ + 2Mx^ 

= Mx^ 

- kS* ~ 4xS^ + 6x^S* ~ 3Mx^ 

It is easy to verify that the last two terms of will always combine, 
since they are 

Cf-i(- + (- 

= K{~ + (- \)^Mx^^ = (- 

Also, from Eq. A.6 we have 
St = Sy_ + Mx = Mx 

= Sa + 2f Sj + MiP = Sa + Mx^ 

St = 8^ + ^x,% + + Mx^ = ^3 + ixS^ + Mx^ 

8 * =8^ + 4*^3 + 6 ^8^ 4- '4*35^ + 

= 8^ + Ax8^ + 6x?82 + Mx* 
since iSj = 0 as previously shown. 

N N N 

Let us now evaluate 2 6 2 and 2 i®- Let 

^ = 1 ^=1 

then 

8 *{N + 1) - 8 \{N) = {N + 1)^' (A.7) 

But also 

8*k{N +!)=! + 2(/ + 1)'-' 

i=l 

so that 

4- 1) - 8 %iN) =. 1 + 2 [(i + 1)^ - F] f 

7=1 

- = 1 + CfS*_i(A) + Cf8\_lN) 

+ ■ • • + Cf_iS*(A) + N , 


(A.8) 







80 


EXPECTATION AND CONVERGENCE 


Ch. 3 


Combining Eq. A.7 and A. 8 , we have 

+ Cf-SS_2(iV) + • • ■ + Cf_,S*(N) \ 

= (iV + 1)^ ~{N+1) J (A.9) 

= (Af+l)[(iV+I)''-'-1] J 

Putting a: = 2, 3, 4 in A.9, we obtain 

28t{N) = (AT + l)iV 
2S*(N) + 3*Sf*(Ar) = iN+ 1)(A2 + 2N) 

4St(N) + 68*{N) + 4S*(N) = (N + IXA^ + 3N^ + 3A) 


so that, as may be verified. 


S*(iV) = 
S*(N) = 
8*(N) = 


A(Ar + 1) 

2 

N{N + l)(2iV + 1) 
6 

N\N + ly 

4 


(A.10) 




APPENDIX B 


Moments 


The rth moment of the random variable u about the quantity c is 
defined to be 'i ^ 

Bill — &f 

If u has possible values with probabiUties Pj, • ■ P„ 

then, by the definition of expected value 

E(u~cy = 2P,{Ui-cy 

i 

If u and V are two random variables, then the product moment of order r 
5 about c, d is defined to be ’ 

E{u ~ cy{v — dy 

If the possible values of « are C4,- • •, and those of « are K, • • • V„ 
and if ^ ^ 

Pi) = Pr{u = Vi, V = V,) 

then 

N M 

E(u - cy{v - d)* = 2 iPdu, - cy(v, - d)‘ 

i j 

If P,j = 0 when i ^j, i.e., the elementary event 11 ^, is impossible if 
i then we denote P,.,. by P; and 

E(u - c)> - d)^ = 2PiiUi - cy(v, - dy 

i=l 

where N is the minimum of N and M. 

Moments about 0 and about the expected values of the random variables 
are the two most important special cases. We define 

jup — Eu^ 
and 

//y = E{u — Euy 


ju* — Eu^v^ 

/.irs = E(u — Euy(v — Evy 

81 


Also, by definition, 




EXPECTATION AND CONVERGENCE 

Remark. If r is an integer, then by the binomial theorem 


Ch. 3 


and 

Also 


and similarly 


a = 0 

(„_ = 2 (- 
f!~0 

„<• = [(h- c) + cY 

= 2 cjc»(«- cr-* 

a = 0 

D* = 2 C’fdl^iv - dy-K 

/J=0 


Hence, if we let fit = ft, we have 


Also 


a = 0 

a = 0 

fit = 1 Clfl’^flr-a 

a = 0 


a = 0 

since 

^1 = 0 

It is left as an exercise to express the ^^/s as a linear combination of the 
and conversely. 

Illustration B.l. (a) If u is a random variable having two possible 
values 1 and 0 with probabilities P and Q ~ I — P, then compute ft* 
and fij,] = 1, 2, 3, 4. By definition 

,dt-^Eu> = ipM 

-i = l 

= Pi^ + 20 ^ = -P / = 1 , • • ■ 

fi, = p(i - py + e(- py 
= PQ’ + (- lyQP’ 

= PQ{Q’-^ + (- lyp^-^} 


Similarly 


;= 1 ,- • • 




App. B 
Hence 


MOMENTS 


83 


i«i = 0 


Pi = PQiQ^ - P^) = PQiQ - P) 

Pi = PQiQ^ + P^ 

(b) If u and v are random variables with possible values 0, 1 and if 
Pr(u = i, V =;•) = P,„ i,j = 0, 1, then 

Pti = (I)(l)-Pii + (l)(0)Ei„ + (0)(l)i’oi + (0)(0)/>oo = Pu 

Pn = Ai(l - - P.r) + Ei.)(l - P.i) 

+ Pio(l - Pi){- E.i) + PJ- Pi)(- P.i) 

= -Piiei.e.i - /’oiASi - Pi^p.iQ,. + Eoo/’i-p.i 

where 

Pv = Ao + Pii - Priu = 1 ) = 1 _ g,. 

P.r = />„! + i>u = Pr{v = 1) =. 1 _ e ^ 

(c) Let it be the arithmetic mean of «i, • • •, u„. Then, as may be 
verified by multiplication, 

= -- {Eu^ + ‘ * * + Eu^ 


lX^{u) — - I 2 £’w| + 2 Euiu}i 

l4{u) I 2 £’«! + 3 2 wfw,. + 2 

= “ 4(2 +42 E^u, + 3 2 Eu\u) + 62 Eu\ufi^ 

n \ 

+ 2 EuiUjU^uJ 

Also fi^u) is of the same form as /<+(«) but with Ui — Eu^ in place of u, 


Exercises 

B.l. If M has possible values with probabilities P, Q, compute.//^, 

j — 1, 2, 3, 4. 





84 


EXPECTATION AND CONVERGENCE 


Ch. 3 


B.2. Verify 
id) = 

^4 = iW* - 4/^^* + 6/^>f - 3^^ 

(/)) ^* = //2 + 

^* = ^3 + 3^//2 + 

/^* = ^4 + 4^^3 + 6 /^ 2^2 + 

B.3. If u has possible values 1, • • •, V with equal probabilities, compute 
{Xf, fAj. Evaluate /uLf,/uLj,j= 1, 2, 3, 4, if V = 11. 


APPENDIX C 


Rapidity of Approach to Limit 


Lety„ be some function of n and suppose that 

lim «'‘|/«| = X < 00 

«->00 

where X is a constant. Then we write 

/„ = 0 («-'>) 

and say that/„ tends to K with rr’'. If X = 0 we will sometimes say 

A = oCti-") 

Similarly, if/„^v is a function of n arid N and if 


lim = X < 00 

n, N-^oo 

n <tN 
0 < / < 1 

where is a constant, then we shall also write 


fnN = 0 («-^) 

Of course h may be 0, which we indicate by replacing by 1. If 
/« — 0(1) or/^ = 0(1) or/„^ = 0(1), then/, or/y oris bounded as 
n, N, or both n and N become infinite. Thus if 

N~ n 
~ WI~i ~ 

it follows that 


N-n a^ 
N~ 1 n 

Since by the Tchebycheff inequality 
Fr(\u^.r- UJ •> 


= 0 


1 

1 AT 


85 





86 EXPECTATION AND CONVERGENCE Ch. 3 

if is the mean of a simple random sample of n elements from a popu¬ 
lation of N elements having as mean and a\ as variance, it follows 
that, if (y\ is a constant or if a% = 0 ( 1 ), then 


It should be noted that even if lim a\ = <x> it may still be that 

converges in probability to 0 but the rate of approach may be 
less. For example, suppose that 0 % — K and n ~ tN^ 0 < ? < 1. 
Then 

For simplicity we shall now suppose that Wi, • • *, ‘ * '^n are 

obtained by sampling the pairs v,, with replacement from a population 
such that EUi — U, Ei\ = F, = g% = cr^, ffuiv, — ^ 

n. Then, if 


^ n 1 ^ 

iin = - 1 Ui, and = - 2 

n i=i G 


it follows that 


n \nJ 

al = -» = 0 (-) 

n \nJ 


\ I n 

Eiu„ - t/)3 = 2 Eiu, -Vf + ^ 2 E(U, - - U) 

H vi = l i¥^3 

+ 2 {Ui-V){u,-U){u^-U)\ 

i^j^k ' 

= ^“ = 0(4) 

w- \n^/ 

if ^, 3 ^ = £■(«._ Uf, / = 1, • • •, n, and similarly 


App. C RAPIDITY OF APPROACH TO LIMIT 87 

if = E(Vf — F)®. Now 

£■(«„ - Uy‘(v„ - K) = i I 2E(u, - Uf{v, - V) 

+ lJ,E{u,~Uf(v,-V) 

i¥=3 

+ 2 £(«,- [/)(«,- U){v,~ V) 


+ 2 E(u,-u)(u,-u)(v,-y)j 

= = 0 (^) if = £(«.■ - - K) 

/ == 1, * • %n 

Theorem C.l. If the random variables • • *, are independent and 
~ U, E(Ui — Uy — pj, i — I, ' • •, 72, y = 1, 2, ♦ • • = 0), and if 

Un is the arithmetic mean of ♦ • •, then 


I 

lim n^ E{u^ — Uy = 

n~^oo 

J 



lim n^ E(u^ — uy = 0 

W—KO 


i 

pf ify is even 


if / is odd 


Proof We can suppose without loss of generality that U 0. Then 

Eui = ^^E(2u,V (C.l) 

n i 

is a sum of terms like 

• -ui:) (C.2) 

where ;i+ 72 + • • ■ + ,A=/ 

If C = 0 then EUf = 0,; = 1, • • •, n, so that any term such as Eq. C.l 
with any of they’s equal to 1 will vanish. Hence we can limit ourselves 
to terms in the expansion of Eq. C.l for which each of they’s is at least 

as large as 2. Then 

E{u{ifi • • • uy) = (C.3) 

and in the expansion of Eq. C.l we will have with a 

coefficient equal to —- ^ ^ijnes a factor that is inde¬ 

pendent of 72. Now, if j is even, then the maximum value of h will be y/2, 
which will occur ify^ = • • • =j\ = 2; and if j is odd, then the maximum 





88 EXPECTATION AND CONVERGENCE Ch. 3 

value of h will be (;— l)/2, which will occur if one of the/s is 3 and the 
others are all 2. Hence, if j is even 



so that 


lim f?Eui = —^ if n is even 

W->C0 I 1! 

\2/‘ 


Furthermore will be the number of ways of distributing j elements 
2 

into jjl boxes each containing 2 elements or 



2 


JL=:li 

5 j 

(2lf 2 “ 


If j is odd, then 

( «{n-!)• • • 1 

= -TPir l^j+oin M 


But 

1 {«(«-!)• . + = 

SO that, if ] is odd, then 

Eiil = oirT'^) 

which completes the proof. 

It may be noted that Theorem C.l holds if we sample with replacement 
from a finite population and let the size of population become infinite in 
such a manner that its monients are unchanged. It is also possible to 
prove theorems like Theorem C.l for sampling without replacement and 
to show that 

_i±i 

E{u^ - uy{v^ - vy = 0(« 2 ) if y + g is even 

-i±SL±l 

= 0{n ^ ) if y + g is odd 



89 


App. C RAPIDITY OF APPROACH TO LIMIT 
Corollary. If j is odd, then 

j+i 

£|«„-{/p = 0(« 2) 

Proof. For any random variable w we have 

o'w = E\wf~{E\w\f ^0 

SO that 

E\w\^>{E\w\f 

and since 

| VV |2 ^;2 

we have 

> E\ w\ 

Hence, putting w = (u^— uy, we have 

E\un- U\^:^lE(u„- 

and thus 

lim £|jj^ _ jjy ^ ijjjj ^ ^ ^ 

But the order of E\u^— f/p must be integral If it were 

i . ^ 

lim« 2 £’|^i^_ j/p- ^ 0 ^ 1 ^ infinite. Hence it must be which 

W —>00 2 * 

completes the proof. 

We can similarly show that ify + g is odd then 

, , , —i±£±i 

E\K-UY\v^--V\o == Q{n 2 ) 

REFERENCES 

H.^Cram6r, Mathematical Methods of Statistics, Princeton University Press, 

(2) W. Feller, An Introduction to Probability Theory and Its Applications Vol I 
John Wiley & Sons, New York, 1950. ' 




CHAPTER 4 


Simple Random Sampling 


DERIVATIONS, PROOFS, AND SOME EXTENSIONS OF 
THEORY FOR CH. 4 OF VOL. I* 


Note. In this chapter we shall present a number of derivations and proofs 
for simple random sampling. As indicated earlier, a simple random sample of 
n elements (sampling units) is a sample so drawn that every combination of 
n elements has the same chance of being selected. Simple random sampling 
with replacement is introduced in some cases in order to simplify results and 
at the same time to provide approximate theory for sampling without replace- 

Many of the theorems and statements proved in this chapter are applicable 
to other types of selection methods. For example, the derivation of the variance 
of a ratio of random variables, and the derivation of the variance of estimates 
of precision are applicable to more complex systems of sampling. 

1. The mathematical expectation of the arithmetic mean of a simple 
random sample (V(5. I, Ch. 4, Sec. 7). To prove: A sample mean for a 
simple random sample of n units drawn from a population of N units is 
an unbiased estimate of the population mean, i.e., 

Ex = X 


where E stands for the mathematical expectation of the expression 
following it, 

n N 



is the value of the characteristic obtained on the zth drawing, and 
is the value of the characteristic for the /th unit in the population. An 
estimate is said to be an unbiased estimate of a specified population 

* Appropriate references to Vol. I are shown in parentheses after section or 
subsection headings. The number following I- after some equations gives the 
chapter, section, and number of that particular equation in Vol. I. 

90 


Sec. 1 EXPECTATION OF ARITHMETIC MEAN 91 

characteristic if and only if the expected value of the estimate is the 
population characteristic. 

Proof. Since the expected value of a constant times a variable is the 
constant times the expected value of the variable (see Ch. 3, Sec. 3, 
Theorem 3, p. 47), we may write 

in In 

Ex ^ - E^x^ 

n % ii I 

Further, since the expected value of a sum of random variables is equal 
to* the sum of their expected values (Ch. 3, Sec. 3, Theorem 5, p. 48), 
we may write 

1 ^ 

Ex = -2Ex, ( 1 . 2 ) 

n i 

We now need to evaluate Ex^. By definition, the expected value of a 
variate, m, is the sum of the products obtained by multiplying each 
possible value of u by its probability, summed over all the possible values 
u takes on. Applying this definition, we have 


EXf — + ^ 2^2 + • • * + XjSfPj^r '' 

iV 

. 

3 ; 


(1.3) 


where X^, • • ♦, Z.y are the possible values of x^ and * • •, Py their 
respective probabilities. 

We now must evaluate P,., the probability that Z,., the jth element of 
the population, is selected at the zth drawing. This probability can be 
written 


1 N- 2 ^ N-i+ 1 1 1 

N N~l N-i + 2 N~ i -h l ^N 


(see Ch. 2, Sec. 6, Theorem 3, p. 28, and Sec. la of Ch. 2), where the 
first factor is the probability that X^ is not selected from the whole popu¬ 
lation of N at the first drawing, the second factor is the probability that 
X^ is not selected from the remaining N~\ elements at the second 
drawing, and so forth, the last factor being the probability that Z,. is 
selected from the remaining A—z+ 1 elements at the /th drawing. 
Evidently Pj reduces to 1/A for all values of /, / = 1, • • •, n. Therefore 
the probability of selecting any element at any drawing is equal to 1/A 
Substituting P,. = 1/A in Eq. 1.3, we have 





92 SIMPLE RANDOM SAMPLING Ch. 4 

which shows that the expected value of each observation is the arithmetic 
mean of the population if simple random sampling is used. Substituting 
X for Ex^ in Eq. 1.2, we have 

1 ” _ 1 

Ex = -2X^-{nX) 
n i n 

= X 

and hence x is an unbiased estimate of X. 

Exercise. Show that, if the sample of n elements is selected from a population 
of N elements at random with replacement, then Ex = X. Hint: The only 
change in the proof occurs in the method of evaluation of Pj (which still has 
the value 1/A). 


2. The variance of the arithmetic mean of a simple random sample (Vol. I, 
Ch. 4, Sec. 9). To prove: The variance, a% of the sample mean of a 
simple random sample of n units selected from a population of N units 

. . 

without replacement is given by (1 —, and if the sample is selected 
with replacement, the variance is given by where 


the sampling fraction 


Xf 

02 ^ i- (2.1 or 1-4.3.10) 

A- 1 



Proof.^ The variance of any random variable is defined to be the 
expected value of the square of the deviation between that variable and 

* See Remark 2 for an alternative proof that does not make use of conditional 
expected value theorems. 


Sec. 2 VARIANCE OF THE ARITHMETIC MEAN 93 

its expected value (Eq. 4.1, Ch. 3), i.e., 

(t| = E{x — ExY 

and since, from Sec. \, Ex ^ Z, we have 

al=^E{x-Xf (2.5) 

Since 

n 

n 

then is a linear combination of random variables; i.e., x can be written 

n 

in the form u =- where c, = l/n and w, = x^. Therefore, by 

Theorem 11, Ch. 3 (p. 56), 

n n 1 

= (2-6) 

where 

or,.,. = E(Xi - X)(a;, - X) i,j =1,2,- • ■,n 
Equation 2.6 can be rewritten 

4 = + (2.7) 

Thus, for a sample of 3, we have 

“ 3(^1 + ^2 ^ 3 ) 

and the variance of x is 

= iCo*? + O'! + CTg + (Ti2 + 0*13 + ^21 + <^23 + ^31 + 0*33) 

— i(^l + 0*2 + ^3 + 20-12 + 2o-i3 + 20-23) 

where o-f, o-|, and or| are the variances of the values obtained on the first, 
second, and third selections, respectively. The remaining terms, 
^12 ” ^ 2 i» ^13 ~ ^3i> ^23 “ ^ 32 > fli© covariances of the values 

taken on by zth andyth selections with i not equal to j. Thus, a^^ is the 
covariance between the values taken by the first and second selections. 
Note that there are 3 variance terms and 6 covariance terms. In general, 
for a sample of size n, the first term of Eq. 2.7 will consist of n variance 
terms and the second will consist of n(n 1) covariance terms. The 

n 

2 in Eq. 2.7 denotes the sum over these n(n — 1) covariance terms. Now 

i^j ’ 



94 SIMPLE RANDOM SAMPLING Ch. 4 

since (»(- Xf takes on the possible values (Aj - Xf, (X^ - Xf, ■ ■ •, 
(Xjv — Xf, each with probabihty I/N, we have, for nf, in Eq. 2.7, 

a? = Eix,-^f^^liX,-Xf = <y^ (2.8) 

Hence, nf = for all i. 

Now, to evaluate the second term of Eq. 2.7, we must find 
<7,, = £(»,- X){x,- X) 

By Corollary 2 of Theorem 14, Ch. 3 (p. 62), we can write 

E(x, - X)(x, -X) = E[(x, - X)E{(x, - X)\{x, - Jf)}] (2.9) 

where E{{x,-X)\(Xf-X)) means_ the conditional expected value of 
{x^ — X) for a fixed value of {x^ — X). 

To evaluate this conditional expected value, when sampling^ without 
replacement, one must in effect list the possible values of {x^ — X) for the 
;th selection for a fixed value of the zth selection, and determine their 
probabilities of selection. Thus, for iV = 3, if {x^ — X)_= 
then {x,- X) takes on the values {X^- X\ and (X 3 - X) with prob- 
abilities equal to 1/(3 - 1) = i In general, the probability of {x^ - X) 
taking on a particular value on the yth selection other than that of 
{Xi- X) is 

Pr{{x, - - ^)] = 

since, with the ith selection fixed, the only possible values taken on by 
(Xj— X) are those left after the value taken on by the ith selection is 
withdrawn from the population. Hence 

E[ix, - - ^)]=( 2 . 11 ) 

Since n n 

1{X,-X) = 1X,-NX^0 

fc * 

Eq. 2.11 becomes 

E[{Xj — X)\(x^ — X)] — — ^ i^i ~ 

Substituting Eq. 2.12 for £[(*,- X)|(a;,- X)] in Eq. 2.9, we have 

E(x,- Xf 


( 2 . 12 ) 


Sec. 2 VARIANCE OF THE ARITHMETIC MEAN 
and from Eq. 2.8 


E(a:,-y)(x,-X) = 


95 


(2.13) 


N- 1 f 

for all i and j. Substituting Eq. 2.8 and 2.13 for (xf an^ in Eq. 2.7, 
we have ] 

n(n - 1) „ ’ 




nHN- 1) 


Since 


- 

" (N- l)n 

N- 1 

N-nS^ 


(2.14) 






a% =: 




(2.3) 


If the sample is selected with replacement, al is equal to the first term 
of Eq. 2.7 since in that equation 

= E(x^ — X)(xj — X)^ E(Xi - X)E{x^ - X) = 0 (2.15) 

Hence, for sampling with replacement. 


a 



(2.4) 


Remark 1, For an estimated total, x' = Nx, the results are 


when sampling without replacement and 

al' = 


when sampling with replacement. This follows from Corollary 4, Theorem 
11, Ch. 3 (p. 57). 


Remark 2, Following is an alternate proof which makes use only of 
the definition of expected value and basic expected value theorems. Re¬ 
turning to Eq. 2.5, we have 

O'! = E(x- Xf 

n \ 2 

Now, the square of an algebraic sum of n items can be written as the sum 
of the squares of the n individual items plus the sum of the cross products 





5 SIMPLE RANDOM SAMPLING Ch. 

of the n{n— 1) possible pairs of the n items, i.e., 

(I = i + i - X) 

(T ) i 

(See also Eq. 2.3, Sec. 2, Ch. 2.) Using this expansion in Eq. 2.16, we 
may write 

al = \E[l (**■ -Xf + 1 (*. - 


= 41 - Xf + 41 (2.17) 

The probability of obtaining {Xj — XY on the /th draw is the same as the 
probability of obtaining X^ on the rth draw, namely IjN. Therefore 


lix^-x)^ 


Eix,- Xf 


Consider now the second term in the right-hand member of Eq. 2.17. 
When the sampling is carried out without replacement, the probability of 
obtaining {Xj — X) on the /th draw and {X^ X) on the A:th draw is 
— 1). This follows from the fact that the probability of obtaining X,- 
on the /th draw is l/N and the probability of obtaining X^ on the kih 
draw, knowing that X^ has been drawn, is l/(iV- 1). Hence 


E(x,-X)(xj,-X) 


2(x,-x)(x^-x) 

iVm _ 

N(N-~ 1) 


when the sampling is without replacement. Substituting Eq. 2.18 and 
2.19 into Eq. 2.17 gives 

y(x,^x)(x^-x) 

A(N-1) 


f (X, - X)(x„ - J?) = If (2r, - X)f - I (X, - x)^ 


and therefore 


3 

_n 2 _ n{n— 1) 

N- 1 

N-n a'^ _ N-n S^ 
N— 1 n N n 


3. The covariance and correlation of arithmetic means for a simple 
random sample (Vol. 1, Ch. 4, Sec. 18). To prove: The covariance, 


Sec, 3 


CORRELATION OF MEANS 


in a simple random sample of n units selected from a population of 
N units without replacement is 


n 

and if the sample is selected with replacement, the covariance is 
axrln, where ,, 


^XT 


Y) 

N~ 1 


N 

N~ 1 


(3.1) 
given by 

(3.2) 


is the value of a characteristic for the yth unit in the population, is 
the value of another characteristic for the Jth unit in the population, 
a; and Z are defined in Eq. 1.1, and y and Y are similarly defined. 

For simple random sampling, the correlation between two sample 
means x and y is independent of the sample size and is equal to the 
correlation between individual observations, i.e.. 


P.W ~~ PXY 


^XY 

^X^Y 


(3.3) 


Note that ct|, which was derived in Sec. 2, is a special case of a^-, 
since = ci, where is defined by Eq. 3.1 with x substituted for % 
(X substituted for Y throughout). Similarly, 


^xx — 2nd Syy — 'Sy 

Proof. The proof follows steps analogous to those given in deriving 
the variance in Sec. 2. Thus, since x and y are linear combinations of 
random vanables, the covariance of x and y is (by Theorem 12 Ch. 3 


1 ” 1 » 



2 2 + 22 

(3.4) 

and since 

^ * i 7^ J 

|(2r,- Z)(T,- Y) 


and 

^x,y. ^ — OxY 

(3.5) 


(y Yv 

^^,yi _1 for i j 

(3.6) 

therefore, 

by substitution in Eq. 3.4, we have 



__ 1 ” 1 ” 




XY 


1 


N 


N-n1(X,- X)(Y,- Y) 
Nn N— 1 


^^8 

Nn 


( 1 -/) 


8 


XY 


(3.1) 



98 SIMPLE RANDOM SAMPLING 

where Sxr given by Eq. 3.2. Now, by definition, 


_ 


and from Eq. 2.3 


= (1 -/) 


8^S 


X^Y 


and therefore, substituting Eq. 3.8 and 3.1 in Eq. 3.7, we have 


S 


XY 


(^XY 


P,y - PXY = axay 


Ch. 4 

(3.7) 

(3.8) 

(3.3) 


4 The mathematical expectation of the sample variance for a simple 
random sample (Vol. I, Ch. 4, Sec. 12). To prove: In a simple random 
sample of n elements is an unbiased estimate of <S if the sample is 
drawn without replacement and 5^ is an unbiased estimate of a if the 
sample is drawn with replacement, i.e., to prove: 

{a) Es^ = if the sample is selected without replacement. 

(b) Es^ = if the sample is selected with replacement, 


where 




_ 


« — 1 


(4.1 or 1-4.3.12) 


and 8'^ and are defined by Eq. 2.1 and 2.2, respectively. 
Proof, We note that 

n 

n — 1 n — 1 n — i 


and that 


Now 


N 


M. ^ _ 

n — 1 n — 1 N 


Ex^ = cr| + 

and, when the sampling is without replacement, 


(4.2) 

(4.3) 

(4.4) 


Hence 


{N- l)n 


nx^ 


d^ from Eq. 2.14 

N-n 


1 n— 1 


(N- \)n 




(4.5) 





Sec. 5 REL-VARIANCE OF ESTIMATED VARIANCE 99 
an^, substituting Eq. 4.3 and Eq. 4.5 in Eq. 4.2, and recognizing that 

0r2 — ^2^ 


N- 1 


Es^ - 


cr^. 


As an exercise, show that when the sampling is with replacement Es^ 

5. The rel-variance of the estimated variance for a simple random 
sample* (Vol. I, Ch. 4, Sec. 12). To prove: The rel-variance, V% of the 
estimated variance, s^, of a simple random sample of n units selected with 
replacement is given by 


where 


n\ n — 1/ n ^ 


(5,1) 


and 


52 = ~ 

n ~ 1 

Oi N 


(5.2 or 1-4.12.7) 


N 

Proof. By definition 

r /2 _ - Es ^f E(s^ - ct2)2 Es^ - (P 

*’ (Es^f ~ ff4 

since, by Sec. 4, Es^ = (P. 


(5.3) 


We now need to evaluate Es^. Apply the following transformation: 

X = Zi 
x— X ^ z 


Then 


Es^ = E 


2(g,- - 

L n— 1 

1 


Now 


(n - 1)2 


- 2nz^2^i + 


n n n 

^ Izf + 2zfz? 

19^3 


(5.4) 

(5.5) 


* May be taken up with Chapter 10 instead of Chapter 4. 









100 SIMPLE RANDOM SAMPLING Ch. 4 

where there are terms on the left and n terms in the first sum, and 
n^—n terms in the second sum on the right. Also 




24 4-2 2 4^; + 2 AA + 2 


(5.6) 


where there are terms on the left and n terms in the first sum, n{n 1 ) 
terms in each of the next two sums, and n{n — 1 )(« — 2 ) terms in the 
fourth sum on the right. Finally 


£4 = 


r« n n 

lA + 42 + 32 

% iAj 

n 

2 


4~ 6 2 _ 2 


(5.7) 


where there are terms on the left and n, n(n 1 ), n{n 1 ), 
n{n - l)(n - 2), and n(n - 1)(« - 2)(n - 3) terms in the five summations 

on the right, respectively. 

Since we are sampling with replacement, are independent, 

and hence, by Theorem 8 of Ch. 3 (p. 54), we have 


Ez^zJ - Ez^Ez^ 

Ezfz^ = EzfEz^ = 0 
Ez^zfj, = Ez^Ez^Ezj^ = 0 
EZfZjZj^^^ = Ez^Ez^EZj^Ez^ 0 


if i 

if i y since EZj — 0 
if i ^ k since Ez^ = 0 
i ^ k ^ m since Ez^ 


\ 


(5.8) 


Hence, from Eq. 5.5 and Eq. 5.8, 

E(24)'==n/r4 + n(«- l)ff‘ 

where fi^ = Ez\, and = Ez\ = E’zf. From Eq. 5.6 and Eq. 5.8, 

E(E^iA) = ^ +«(«- 1 )'^^] ^ (' 2 r') 

From Eq. 5.7 and Eq. 5.8, 

1 ^ /i 4 3(n — 1) 4 

Ez^ = ~i [«/^4 + 3n(« - ] = ■;^ + „3 ^ 

Substituting these quantities in Eq. 5.4 and subtracting cr‘, we have 




n//4 


[4 + n(n — 1 )(T* — 2//4 ~ 2 (n — 1 


{n-\f 

(^+3 1)M 

n \ n / 


4 - 



Sec. 5 REL-VARIANCE OF ESTIMATED VARIANCE 


101 


/N , a* ( 
n n— I \ 


2 + - — « 4- 1 


Finally 


Hi __ ~~ 3) 

n n— \ n 


VI 


Es^— 1 


<7^ 


«\ «-!/ 


(5.9) 


(5.10) 


For reasonably large n we may assume (n-3)/(n~ 1) = 1, in which 
case V% is given approximately by 




(5.1) 


Renwk 1. If the population mean were known and were used in 
estimating then 


2(a - 

C.2 —i _ 


(5.11) 


is an unbiased estimate of cr^ and 


The proof, which follows the steps above, is left to the reader. 

Remark 2. It is tedious but not very difficult to obtain a similar result 
lor sampling without replacement. The result is: 

y,- - ~ -r 1--[(„ _ 2)(„ _ 3) _ _ I)] 

/S 

' ^ y — 

[(«- 1)2 + 2 ] 


V2(n-l)2| „ n{N~-l) 

4fa - \)in - 2)(« ~ 3) 6(/» - Din - l)(n 3) 

«(V-l)(7V-2) «(V-l)(Ar-2)(iV-3) 

(V- 1)2 K«- 1)V 


+ 


N\n- 1)2 [n(iV- 1) 

2(n~ 1)(»- 2)(«- 3)1V 3(« - 1)(« - 2)(« - 3)V 

^ n(N- \)(N~ 2)(V- 3) 

N\n- 1)2 
(V- 1) 


«(V-1)(V~ 2) 

] 


(5.13) 


From a comparison of Eq. 5.10 and 5.13 it will be apparent why we use 
pSble^ replacement m approximating the variance of 5 ^ whenever 




102 


SIMPLE RANDOM SAMPLING 


Ch. 4 


Remark 3. When sampling with replacement, the expected value of any 
term in which appears only to the first power is zero. Hence, an easier 
development of can be made by recognizing that 

n n 

„2 ^ i_- 

n n(n — 1) 


and 



6. The rel-variance of the estimated standard deviation* (Vol. I, Ch. 4, 
Sec. 12, Eq. 12.5). To prove: The rel-variance, of the estimated 
standard deviation, .y, is given approximately by 

( 6 - 1 ) 


and it is desired to investigate the order of approximation. 

Proof. Although the proof is expressed in terms of 5, the results are 
valid for any positive random variable s. 

Let Es^ = (T^. Then, since 


E(s-af = Es^-2aEs + (y‘ 
= 2(j(a — Es) 


it follows that 



( 6 . 2 ) 


and hence the investigations of approximations to the bias of 5 and to 
the rel-variance of ^ may be carried on simultaneously. Furthermore 
Eq. 6.2 shows that for any random variable s such that Es^ it 
follows that 

( 6 . 3 ) 

O' 

so that the bias, Es-a, is negative, i.e., the expected value of the 
estimated standard deviation is always less than the standard deviation. 

It is not difficult to verify, by using the fact that _ ^^ = (^ + cr)(i - (t), 
that the following is true for all values of i and a: 

s-a 

a " 8cr« 16(T« 


I6a%s + of 


( 5 ^ + 450 + 5o^) 


(6.4) 


* May be taken up with Chapter 10 instead of Chapter 4. 




Sec. 6 REL-VARTANCE OF STANDARD DEVIATION 
Since = cr^ it follows from Eq. 6.4 that 
cr — 1 1 

~ 

-f- -1_ E — (y^)^(s^ + 4s a + Sor^) 
160*6 (s + o')4 

Therefore, from Eq. 6.5 we have 

a ~ Es 1 


= J_ + 5cr2) 


Since 


16cr6 


(s + ( t )4 


Ff a~Es 


103 


^ (6.5) 


from Eq. 6.2, we have 


t +Si 


( 6 . 6 ) 


Now 


so that 


J_ ^ (i^ - 0^2)1(52 + 4^(, 4_ 5^2) 

8^0 ^+(r)4 

, + 450- + 5cr2 

1 <-^-- ^ 5 


(s + af 

K 1 1 

T ^ ^ T ~ i 


For sufficiently large n, ^E(j'2_^2)3 ^nd A£(^ 2 __„ 2)4 

relative to (FfO/4, and we can say that Vf is given approximately by+ F^,. 
This IS the approximation to Ff for any sample design. For a simple 
random sample of n units we can substitute Eq. 5.10 for Ff. to obtain 


/? 


11 -— 3 

11—1 /?— 1 


4n 


4n 


(6.7) 





104 


SIMPLE RANDOM SAMPLING 


Ch. 4 


or 


Vs = 


J 


tzl 

An 


(6.8 or 1-4.12.5) 


7. A condition under which the approximation to the coefficient of 
variation of the estimated standard deviation is reasonably satisfactory 
(Vol. I, Ch. 4, Sec. 12). The approximation to the coefficient of variation 
of the estimated standard deviation, = V as obtained in the pre¬ 
ceding section, will be a reasonably good approximation for sizes of 
sample such that V,. is less than .3. This is indicated by the following 

line of reasoning. r r fu- 

Note that, if we use the approximation to V, given in Sec. 6 ot this 

chapter, an approximation to the confidence limits (limits of sampling 
variation) on s with a reasonably large sample would be 



Another formula for obtaining confidence bounds for 5 is 

^ = (tVi ± tV~. ('7-1) 

This follows from the confidence limits for 5^, which are (1 i tV^^, 
and from the fact that .y and are mathematically dependent. 

Values of F,. for several levels of error in the approximation to the 
confidence bounds are given in the accompanying table for ? = 1, 2, and 3. 
In the table r is the ratio of the approximation to the confidence limits 
to the limits given by Eq. 7.1, i.e., 


Vl ib 2^Fg2 

and any value of less than that shown in the table will yield a smaller 
value of r than is indicated in the table for the corresponding values of 
t and r. Hence, except when a very high probability is demanded for the 
confidence bounds covering the true value, such as that associated wit 
^ ^ 3 ^ y^^j2 will be a reasonably good approximation for sizes of sample 
such that V,^ is less than .3. __ 




Vs^ for given values of r and t 


t 

r-l.lO 

r- 1.12 

r = 1.14 

r - 1.16 

r - 1.18 

1 

2 

.59 

.62 

.65 

.67 

.69 

.29 

.31 

.32 

.34 

.35 

3 

.19 

.21 

.22 

.22 

.23 


* May be taken up with Chapter 10 instead of Chapter 4. 



Sec. 9 SIZE OF SAMPLE FOR STANDARD DEVIATION 105 

8. The rel-variance of the estimated standard deviation for a simple 
random sample drawn from a bbomial distribution (VoL I, Ch. 4, Sec. 12). 
To prove: For a binomial distribution, the coefficient of variation of the 
estimated standard deviation, .y, in a simple random sample of n elements 
drawn with replacement is given approximately by 

where P is the proportion in the population having a specified character- 
istic and g = 1 — p. 

Proof. We saw in Sec. 6 that the coefficient of variation squared of 
the estimated standard deviation, 5-, in a simple random sample of n 
elements drawn with replacement is given approximately by 






3 

n— I 

An 


where ^ For a binomial distribution: 


P4 




N 


-P(l-P)4+ 2(0-P)4 


and therefore 


=^PQ-~ 3P^Q^ 

-= PQ 


substituting for /? in Ff gives 




_ 1 _ 

PQ 


-3 


* 4n\PQ n-Xj 


( 8 . 2 ) 

(8.3) 

(8.4) 

(8.5) 


9. Size of sample required to estimate the standard deviation of a pro- 
portion with a prescribed precision, say = .1 (Vol. I, Ch. 4, Sec. 12, 
p. 131). a. To prove: For n greater than 60, V, is less than .1 if 
.20^ .80, and it is reasonable to assume that V, will be less than .1 

when the sample proportion, p, lies between .30 and .70. 

Proof. For a binomial distribution, from Eq. 8.5, 


j/2 ^ -L / J_ _ 4n-6 \ 

“ 4n \PQ~ n-x) 

Note that Ff decreases as PQ increases, for a fixed n. 


(9.1) 


If we fix the size 





106 SIMPLE RANDOM SAMPLING Ch. 4 


of sample at « = 60, it follows from Eq. 9,1 that will be less than .1 
for .20 < F < .80. Moreover, when the sample proportion, p, lies 
between .30 and .70, the population proportion, F, is likely to lie between 
.20 and .80. Thus, when p = .30 for a sample of 60, then 


p-2a,=^.30-2 M = 30-2 J-- 


20 X .80 
60 


.20 


and therefore, when p — ,30, it is likely that F is greater than .20. Simi¬ 
larly, when p — .70, then .70 + = .70 + .10 = .80 and it is likely 

that F is less than .80. Therefore, we can be reasonably sure that a 
sample of 60 will be sufficient to estimate the standard deviation of a 
sample proportion or a total which is between 30 and 70 per cent of the 
population with a coefficient of variation of less than 10 per cent. 

b. To prove: For nP greater than 25, V, is less than .1, and it is reason¬ 
able to assume that is less than .1 if np is 35 or greater. 

Proof. From Eq. 9.1 above, solving for «, we obtain 

i + 4.-; - 4 + 7(4- 4f;- i)’ - (^ -.) 

--sF! 


lim nP 

p->o 




since g -> 1 and we have lim nP 25 for F, = .1. Further it is reason¬ 
able to expect that nP will be greater than 25 when np is greater than 
35, since 

2cr,„, = 35-2 = 35 - 2 = 35- 10 = 25 

and therefore nP is likely to exceed 25. 


10. A simple random sample of a population contains a simple random 
sample of any subset of the population (Vol. I, Ch. 4, Sec. 7). Suppose 
that a simple random sample of n elements is drawn from a population 
of N elements. Let K be the number of elements belonging to some 
subset of the N elements in the population, and let k be the number of 
elements in the sample of n belonging to that subset. Then the sample 
of A: is a simple random sample of the K elements, with the same expected 
sampling fraction as the sample of n. That is, to prove: 

a. All combinations of k elements are possible samples and are equally 
likely. 




Sec. 11 


VARIANCE OF RATIO OF RANDOM VARIABLES 


107 


Proof, a. Fix the value of k at k^. Then any particular combination 
of elements among the possible samples of from the subset wUl 
occur exactly times among the Cf possible samples of n elements 
rom N. Therefore, all samples of are equally likely. 
b. Let Xi =r 1 if the /th element belongs to the subset; 

= 0 otherwise. 


n 

Then 2^^- ~ k and 2^* = K. Hence, 


£ h. ^ 

K K 


N 

N 

K 


n 

N 


11. The rel-variance of the ratio of two random variables (Vol. I, Ch. 4 
Sec. 18). To prove: The rel-variance of a ratio of two random variables 
u and w is given approximately by 

^(u/w) K-h K — (11.1 or 1-4.18.12) 

Proof. An approximation to the variance of a ratio of random 
variables ujw may be arrived at as follows. Let 


Aw = 


then 

Similarly, 


Eu 


1/2 __ 
^ (W/Mf) 


*ulw 


u — (^Eu)^\ -f- Aw} 
w = (L:iv){1 + Aw} 

[w Ew 


Ew) 


=4 


+ Aw 


Now 


(-) 

\Ewf 

= £{(1 + Att)(l + Aw)-i- l}2 


“f“ Aw 


( 11 . 2 ) 


1 + Aw ~ ~ + *^^**’^^ ~ 


which will converge to (1 + Aw)-i if Aw is less than 1 in absolute value. 
When Aw converges in probability to zero, it is not necessary that Aw be 
less than 1 for all possible values of w in order for the initial terms in 
this expansion, when substituted for (1 + Aw)-i in Eq. 11.2, to yield a 



IQg SIMPLE RANDOM SAMPLING Ch. 4 

useful approximation to the rel-variance. Thus, Eq. 11.2 becomes 
approximately 

-f Aw)[l — Aw + (Awf] — If 

“ £(Aw — Aw + (Aw)^ — AwAw + Aw(Aw)^}“ > 

Retaining terms of the second order or less, and noting that E^u = E^w 

= 0, we have 

£{(Aii)2 + {Aw)2— 2AwAw} = — 2p^^VuVw 

since 


£AwAw ^ 


{Eu)iEw) 


Puw^u^v 


(11.4) 


To see that Eq. 11.3 provides a useful approximation to the rel-variance 
of the ratio of sample means, for n sufficiently large, we note first that 
Eq. 11.3 with u = x,Eu = X,w=^y,Ew==Y may be rewritten 


(l^ 

i ! ^2 + f 2 


(X- X)(§- Y) 


-- E |(^ 


XY 

^ 1 


(11.5) 


Also 


^ ixly) 


-(r?) 


XV 

y2 


X)~-^(y- Y) 


2 j_ 

X2 


~-k^ 

X2 


X 


(x^x)~^(y~r) 




( 11 . 6 ) 


where the first term is Eq. 11.5, the approximation to the rel-variance. 
We need merely show that, for some value of n sufficiently large the 
second term of Eq. 11.6 becomes very small relative to the first term. 
To do this we note that (see Remark in Sec. 4, Ch. 3, p. 56) 


[(»- 


X) 


p-f) 


■m 


(^^X)-p- Y) 


'tzill 


21 


(11.7) 


Now find a value §, which represents the smallest of the sample averages 
from the population, and assume all> 0, and let us call that value x, 
By substituting Yf, for in the denominator of the right-hand side of 



Sec. 12 CONDITION ON RATIO ESTIMATE 109 

(U.8, 

By evaluating the expected values of the terms in Eq. 11.8 we find that 
he order of the first factor* is Vn^ and of the second factor is l/n, and 

term 

Eo n s « sufficiently large, 

11.8 will be small relative to the approximation to the variance. 

esthnrtr estimated totals and other sample 


12. An indication of a sufficient condition for the approximation to the 
standard deviation (or coefficient of variation) of an estimated ratio to be 
reasonably satisfactory (Vol. I, Ch. 4, Sec. 18).t To prove: The approx¬ 
imation to the standard deviation (or coefficient of variation) of an 
^timated ratio, /• -- ujw will be reasonably satisfactory provided that 

coefficient of variation of the denominator of the ratio, V„ is the coefficient 
of variation of the numerator, and p is the correlation of u and w. 

. 1 , shown that exact confidence limits, R. and R, for 

the ratio R= U/IV are given by the solution of 

(u^ - thl) - 2Riuw - J q- r 2(^2 _ , 2 ^ 2 ) _ 0 (j 2 

where t determines the appropriate probabilities of the normal distribu- 

estrma"tes ofT distributed, and 4, 4, and are the sample 

estimates of the vanances and covariance of u and w. Then at the 

significance level /, AR^ is the relative length of the upper part of the 
confidence interval, and AR, is the relative length of the lower part, where 

AR, = 


AR, 


R, 


( 12 . 2 ) 


We can estimate the rel-variance of a ratio r by substituting sample 
estimates for each term in Eq. 11.1; that is, an estimate of is 


vt 


+ 2p\v, 


(12.3) 


* See Theorem C.l in Appendix C to Ch. 3, p. 87 
t By Margaret Gurney, Bureau of the Censur 

* Lyl^'diferrer'^"'”®”' 51 




no 


SIMPLE RANDOM SAMPLING 


Ch. 4 


where and o' are the sample estimates of the rel-variances and 

correlation of u and ir. With this estimate of the rel-variance, symmetric 
relative confidence limits are obtained in the usual manner by computing 

r zb tv^ (12.4) 

Thus, tv, plays a role in defining confidence limits similar to 
AR, and we can get a measure of the closeness of the approximation of 
eJ’i 2.4 to the ifmits obtained from Eq. 12.1 by comparing tv, with 
ARi and AR^. Let 


ARi — tv, 


tv. 


tjvl, — p'VyVw) + t“v\vl,{\ _P_) 

(1 - t^vl)v. 


tv„fc + Vl- 


2/n2 


where 


1 — rv 


X — 


1 — kp' 


Vl + /c^ - 2kp 


(12.5) 


( 12 . 6 ) 


k = vjv„, and b^vl- pv„v„ is the sample estimate of the relative bias 

of r as an estimate of R (see Eq. 14.1 and Sec. 1 )• 

Sj is defined analogously for the lower part of the confidence interval 

and is obtained by replacing t by —t. 

Note that 

, (12.7) 

«^“ = (l_fcp')^ + fc2(l-p'^) 

and cannot exceed 1. < • r 

If we let t and x be positive in Eq. 12.5, we have the maximum of the 

absolute values of c, and which we shall call e. Then s is monotonic- 

ally increasing with increasing x when tv <h Vk Hn 

tv! < 1 the maximum possible value of r is obtained by setting x - 1 m 

Eq. 12.5. We then have 

Si.ce E,. .2. i. — 

maximum value on s, say e (max.; . , u ^ .urh that 

probability level (i.e., a specified t) the maximum value of such that 

e < .15. 

For example, if we specify t = 2.5, we have 



Sec. 13 VARIANCE OF RATIO OF RANDOM VARIABLES III 

Thus, for ^ < .05, the use of ± tv, as confidence limits will give approx- 

f”m Ea n variates) 

ahraoL ®^idence that provides a reason¬ 

ably good approximation to the rel-variance of r. 

Remark 1. If 6 = 0, then a: = 0 and for t = 2.5 and s (max 1 = 15 we 
have, by sub.stituting these values in Eq. 12.5, ^ • 5 we 


1.15 > 


Vl - 6.25V,?, 


or < 


.15 is sufficient for the approximation to be good. 

rk 2- Tlv Rllhctitnfinrr _L * .it. _ . i_ i ® . 


Ea Ts'± ^ and the observed values of ir and v„ in 
Rimark*3 for any particular sample results, 

^r! also to the population variance 

in Eq nt ^ ^ P'»> and p for v?„ v%, and p' 


• ''“"“"O® of “*0 ratio of two random variables estimated from a 

simple random sample drawn without replacement (Vol I Ch 4 Sec 18 

fl ^ r for a simpk random 

sample of „ units drawn without replacement from a population of N 
units IS given approximately by 

(t 2 _y^ / P"l+ J^r~ 

\ n 


where 


0 


(13.1 or 
1-4.18.1) 


XX V 

; = - IS an estimate of R = - 

y y Y 


VI 


n 


V2 


f2 


( N - 1)^2 

I(Y,- Yf 
(N- l)f2 

N 


PxyVxVy 


_ l.(^i - .?)( F, - ?) S 


Pxr-^ = 


(N- 1)VF 

Proof. By Sec. 11 with « = » and v = ji we have 


'XY 

XY 


By Sec. 2 


n^Vf^R^[Vl+Vl-~2p,,V,V,] 


(13.2; 






x^ 


n 




n 




112 SIMPLE RANDOM SAMPLING Ch. 

and similarly for Fi By Sec. 3 __ _ 

~~ Pis X Y ~ X y 

n 

Substitute these values into u? (Eq. 13.2) to obtain the desired result. 


14, An approximation to the bias of a ratio estimate (Vol. I, Ch. 4, 
Sec, 18). The bias of r - ujw as an estimate of R =- UjW, where u and 
w are random variables and Eu — U, Ew — IF, is given approximately y 


To prove: 


R{Vl-p„VuVw) 

E{r- R) = RiVl- PuwK'XJ 


(14.1) 


Proof. The procedure for generating an approximation to the bias is 
the same as that given in Sec. 11. Here we let 

u~~ Eu 


u = (£'t<)(l + A«) = i/(l + Am) 

Simiiai’ly ^ ^ 

■tv = (£'vv)(l + Ah>) =- W(l + Ah’) 

then 


\w W/ w 

= E^{{\ + Aw)(l — Aw -f (Aw)^ — (Aw)^ + • * •} — 1 ] 


^ ^ £ [1 __ Aw + (Aw)^ — (Aw)^ — AwAw + Aw(Aw)^ — 1] 

and, ignoring the terms of order l/n^ or higher, we have 

E- ri/2 _ „ V V \ (14.2) 

Puw^u^wi 

w 

In a manner similar to that given in Sec. 11, the remainder term in the 
ar proximation can be shown to approach 0 faster than Eq. 14.2. Hence, 
for sufficiently large n, Eq. 14.2 is a satisfactory approximation to the bias. 



Sec. 15 


BIAS OF RATIO ESTIMATE 


113 


regression line of « on w passes through the 
origin, the approximation to the bias is zero. & ^ 

JveTb/'°"’ ‘'’® considerations below. The regression line of u 
«-£^=P„»|F(h'- WO (U,3) 


If this line goes through the origin, then 


and 


U-Pu.^-W 

_ Kml 
Puw 

flw^Wariro whln^thp* ^ f' ‘*'® approximation to 

me bias is 0 when the regression of « on w is through the origin. 

Remark 2. For simple random sampling with r = xjy and .r - %, 

i^^giverTby *' approximation to the bias of r as an estimate of R = XIY 

f . . 

(14.4) 


R(y\- p^yV^^Vy) 


. Decreases in the bias of the ratio estimate relative to the standard 
deviation with increasing sample size (Vol. I, Ch. 4. Sec. 18). To prove- 
With simple random sampling, the bias of a ratio estimate r = xh. 
decreases faster than the standard error of r, and with a moderately large 
sample the bias of r will be negligible in relation to its standard error. 

Proof. From Eq. 14.3 the approximation to the bias of r is given by 


R 




n 


(n-PxrV^-yy) 


and from Eq. 13.1, the approximation to the variance of r is 


Consequently 


^ + n~2p^y,.r,.Vy) 


— = lsif_lYy- Pxr^xVrY 


05.1) 


(15.2) 


or 


A 


n+V\.~2p^yV^Vy 
V M V 1/2 , r/2 V 


n ^ V\+V\-2p^yV^Vy 
which decreases with increasing n. 


CARNEGIE INSTITUTE 
OF TECHNOLOGY LIBRAHiy 



114 SIMPLE RANDOM SAMPLING Ch. 4 

16. Two conditions under which the sample ratio is an unbiased estimate 

of the population ratio (VoL I, Gh. 4, Sec. 18). 

а. The ratio r = ujw of two random variables u and w is an unbiased 
estimate of R = Eu/Ew when ujw and w are uncorrelated. 

n n n n 

б. If M = and w = 2yi> the ratio r = is an unbiased 

estimate oi R=- XjY when the conditional expected value of is equal 
to Ryi for any given y^. This is a special case of a above. 



17. The variance of an average and of a total for a subset of the population 
(VoL* I, Ch. 4, Sec. 10 and 16). For the class of populations for which 

X. = 0 whenever 7, 0 

7^ - 1 or 0 


Sec. 17 


VARIANCE FOR SUBSETS 


115 


conditional variance of r = f/y for a particular 

sample of n units is 

N„~n. 


NgUg 


81 


and the expected value of this conditional variance over all samples of n 
units IS given approximately by ^ 


where / = nIN; 

Ngh the number of units for which E,- -- 1; 
rig is the number of such units in the sample 
8^g is the variance among such units; 


(17.1) 


i.e., 


where 


N„ 


81 




N^~\ 


(17.2) 






and Vl^ is given by Eq. 17.9. 

An unbiased estimate of X, is 


n n, 

r = - = 2^ = 2^' 


(17.3) 


/(■a 

In JXi, the a;,, is the value of the ^-characteristic for the ith element in 
the sample having 1. 

Proof, By definition 


( 7 ^ = E{r — Erf 


E 


N. \ 2 
2 ^/ __ 2^7 

N. 


(17.4) 


By Theorem 14^ Ch. 3 (p. 61), we may write 


EE. 




n. 


~ X. 


(17.5) 





116 SIMPLE RANDOM SAMPLING Ch. 4 

where means the conditional expected value for a fixed rig of the 
expression immediately following it. By Sec. 2 and 10 




1 I(X^-X^ 


N, n, N,j 


From Eq. 17.2, 17.5, and 17.6, we have 


N,, 


= {Eng){\ + ^ng\ where ^ng == 


Hg — Eng 


1 + ^n, 


1 + - (A/7^)^ 


it follows that 


^ £{1 + An, - {^n,f + • • •} 
n, En, 




since, by Sec. 10, EUg = {nlN)Ng =^fNg. 
Furthermore, 


where 


N Q Q 

N-l ^ Pn Pn 


N 

P = —, Q=\--P 
N’ ^ 


Substitute the approximation of E(\/ng) in Eq. 17.7 to obtain the result 
given by Eq. 17.1. 

b. To prove: The rel-variance of x', an estimate of a total for a subset 
of the population, where 


AT « 

x' = - ^x^ 


is approximately 


J ^ Pn 


(17.10 or 1-4.10.6) 




Sec. 18 REL-VARIANCE OF ESTIMATED VARIANCE 117 
where 

N 

e- i-p 


and 

. Proof, By definition 

Now 

and 

where 


N’ 






VI 


{Ex'f- 
Ex' = X 




(1 -/) 2ix,~ xf 


N- 1 


N N 


and if N is large, so that we may assume N/(N-~ 1) = 1, 


' X 


IN 

bziiin 

n \nX^ 


z-y 

n 


1-/ 


NP^Xl 

. n. 

1 2^1 


Lp \x^x^ 




( 1 -/) 


the result given by Eq. 17.10. 


Pn 


^‘^^■■variaiice of the estimated variance of a ratio estimate* (Vol. I 
Ch. 4 Sec. 21, Eq. 21.6). To prove: The rel-variance, V% , of the esti¬ 
mated variance, .y of the ratio r xjy for a simple random sample of 
n units drawn with replacement is given approximately by 


y% ~ ^\ I __ — ] 


(18.1) 


n n n 

where is the correlation between Z? and Y Z = F _ /? y r 

- - vv I ^ ii Pz 

* May be taken up with Chapter 10 instead of Chapter 4. 




llg SIMPLE RANDOM SAMPLING Ch. 4 

and Vr are defined in Sec. 5 and 13, respectively, and either 


X- 


s; = with?' 

y 


_ 4 


n 


n 


where s\ and ^1. are given by Eq. 4.1, and 


'xy 


sxY __ h'g.: - y) 

h {n-\)n 


or 


and 5 


n (n — 1)^- 


2^1 

n 


Consider 


^ n nf n mf 


The rel-variance of 4' will be a good approximation to the rel-variance 
of for large enough n. The derivation here substitutes ^ or 
__ V. For the effect in an analogous situation where X is substituted 

for X, see Sec. 5. 

Therefore, for n large, « 

7 Z- 

v%, = L*- = where m ^ 

Sampling with replacement has been assumed for simplicity and will be 
a good approximation to sampling without replacement whenever N is 

large relative to u. 

Proof. By Eq. 11.1, with u i?, w 


Consider the first terra in the right-hand side of Eq. 18.2. 

By the procedure used in Sec. 5, we find 

" n \ 4 / n 


^ n 


(18.2) 


(18.3) 

(18.4) 


Similarly, 







Sec„ 19 COMPARISON OF ESTIMATES " ] 

where terms of order or higher are ignored (see Sec. 6). Finally, 



n 

which is left to the reader to be developed as an exercise. 
Eq. 18.3, 18.4, and 18.5 in Eq. 18.2, we obtain Eq. 18.L 

Exercise. Prove that when 

1 A 1 ?? 

^ = and 

and n is very large. 


(18.5) 

Substituting 


19, Condition for the ratio estimate to have a smaller rel-variance than 
the simple unbiased estimate (Vol. I, Ch. 4, Sec. 19). To prove' 
approximation to the rel-variance of the ratio of two random variables 
ujw will be smaller than F^, the rel-variance of an unbiased estimate of a 
mean or total from the same sample, when 


P,m> 

The proof is left to the reader. 

Special case of Eq. 19.1. Let 

(19.1) 


% 

II 

N 


and 

II 

1! 


From Sec. 2 it follows that 

w 


Similarly, it follows that 

n 

(19.2) 

y2 /I pN 

^ iv'iY) — li —y j -— 
n 

(19.3) 


From Eq. 3.4 of Sec. 3, it follows that 


Px\{y'!Y) — PXY (19.4) 

And, substituting Eq. 19.2, 19.3, and 19.4 into 19.1, we see that the 





120 


SIMPLE RANDOM SAMPLING Ch. 4 


condition for gain through the use of the ratio estimate x over the simple 
unbiased estimate x' is 


PXY > 


IlL 

2Kx 


*20, Consistent estimates of averages and variances (Vol. I, Ch. 4, Sec. 12). 

To prove: The sample estimate x X and_the sample estimate of 8’^ 
are unbiased and consistent estimates of X and respectively, where 
X and X are defined by Eq. 1.1, is given by Eq. 4.1, and 8^ by Eq. 2.1. 

Proof. The proof that x is an unbiased estimate of X is given in 
Sec. 1, and that 5 ^ is an unbiased estimate of 8"^ is given in Sec. 4. Since 
a% and g;, approach 0 as « increases, it follows from the corollary of 
Theorem 'l9, Ch. 3 (p. 75), that ^ and are consistent estimates of 
X and 8^. 

*11. Functions of random variables which are consistent estimates of the 
same function of population characteristics (Vol. I, Ch. 4, Sec. 18 and 21). 

a. xjy is a consistent estimate of Xj T, where x and y are defined in 
Sec. 1, and f 7^0. 

b. sjx is a consistent estimate of 81X = F, where 5 is given by Eq. 4.1 
and by Eq. 2.1, and Z 0. 

c. s^jx^ is a consistent estimate of 8‘^IX^ — F^, and Z 0. 

^ ^2 ^ Xvl- 2v^y) is a consistent estim_ate of g% where sf 

is defined in Sec. 18, af is given in Sec. 13, and Y 0. 

The proofs follow immediately from Corollary 2 of Theorem 20, Ch. 3 
(p. 75), which states that a rational function of consistent estimates is a 
consistent estimate of the same rational function of the quantities being 
estimated, if the denominator does not vanish when the quantities 
estimated are substituted in the rational function. 

* May be deferred. 


REFERENCES 

(1) E. C. Fieller, “The Biological Standardization of Insulin,” /. Roy. Stat. Sac. 

Supplement, Yll (1940-1941), \95\. tvt ^ 

(2) W. G. Cochran, Sampling Techniques, John Wiley & Sons, New York, 

1953, Chapters 2, 3, and 4. . „ ^ 

(3) W. Edwards Deming, Some Theory of Sampling, John Wiley & Sons, New 

York, 1950, Chapter 4. 



CHAPTER 5 


Stratified Simple Random Sampling 


DERIVATIONS, PROOFS, AND SOME EXTENSIONS oi? 
THEORY FOR CH. 5 OF VOL. I* 

Note. A stratified simple random sampling plan is one in which the elements 
(sampling units) of the population are divided into groups, referred to as strata 
such that each element is contained in one and only one stratum The sSe 

rtrltl'^Th" " "“P'® elements from 2h 

stratum. The sampling fraction may vary from stratum to stratum or mav be 

refe^re? T “ sampling fraction is uniform the sampling p\L is 

referred to as proportionate stratified sampling. ° ” 

Note that the theory of stratified simple random sampling is equally applicable 
once the sampling units have been defined, whether the sampll4 units are 
elementary units, as assumed above, or clusters of elementary unit! Specific 

“Thfnotm" “P sampling is deferred to sucweding chapters. 

The notation in this chapter is the same as that followed in Chapter 4 except 
that a subscript (h) is added to designate the strata. ^ 

1. The expected value, variance, covariance, and correlation of unbiased 
estimates from a stratified sample (Vol. I, Ch. 5, Sec. 3). To prove- 
a. Estimates of the form ^ 

iNuX, 

- (1.1 orR5.3.2) 


iV 


are unbiased estimates of JP= where if, = x.jn, = is 

a sample mean based on a simple random sample of n, units from the 
/Jth stratum, ir,,. is the value of a characteristic for the zth unit in the 
sample from the hxh stratum, is the number of units in the hih stratum, 

L is the number of strata, V= Jf, = XJN, = is the 

population mean for the hih stratum! and V,, is the value of the character- 
istic lor the /th unit in the population in the hih stratum. 

1° 1 are shown in parentheses after section or 

subsection headings. TEe number following I- after some equations Lei 
the chapter, section, and number of that particular equation in Vol. I. ® 

121 




122 STRATIFIED SIMPLE RANDOM SAMPLING 

b. The variance of x is 


Ch. 5 




1 1 

h 


fh 


^Ix 


where 


and 


81 


hX 


liXu- xy 

Jt_ . 

N.- 1 


(1.2 or 1-53.9) 


(1.3 or I-5.1.2) 


Jh — 


N, 


c. The covariance of x and y 


IS 




1 |yv|LJiS 


where 


8 


hXY 




i 


'hXY 


Y) 


A,.-l 


(1.4) 


(1.5) 


and X and y are sample means for two different characteristics of the units 

included in the sample. ..u + 

It follows, by definition of the coefficient of correlation, that the 

correlation between x and y is 

( 1 . 6 ) 


Pxy 




Note that the variance is a special case of the covariance, i.e., 

^xx 

where is given by Eq. 1.4 with ^ substituted for y, and 

^hX “ ^hXX 

where is given by Eq. 1.5 with X substituted for Y 

Proof, a. For the /ith stratum: Xj, depends on the sample and is a 
random variable, and XJX is a constant. Consequently x is a linear 
combination of random variables.* If we take the expected value o x^ 

* A linear co^ination of random variables is a sum of the form « 
where the w?, are random variables and the are any constants. Tnu , 
u, xJyn where Xj, and y^ are random variables, we say that u is a linear 
combination of the but not a linear combination of either ‘he ^ s or the ^ 
Similarly, if = 4, where b is different from 1, we again say that^u is a linear 

combination of the «,;s but not of the x,;s. In the case of ir = pn^,JN, we 
Ipf n. = NJN and h„ = x,„ and we say that ai is a linear combination of the 





Sec. 1 ESTIMATES FROM STRATIFIED SAMPLES 123 


we have which follows from Sec. Ch. 4. Consequently, 

Ex = X, and x is an unbiased estimate of X. This follows since the 
expected value of a linear combination of random variables is the same 
linear combination of the expected values of the random variables 
(Theorem 6, Ch. 3, p. 49), and by substituting for Xj, in Eq. 1.1 we 
obtain X. 

b. Since the sample selection is carried out independently in each 
stratum, Xj^ is independent of x^, where h and k designate any pair of 
strata, and x is a linear combination of independent random variables. 
It follows from Theorem II, Corollary 1, Ch. 3 (p. 56), that if u is a 
linear combination of independent random variables of the form 

= (1.7) 

where the are constants and the Uj^ are independent random variables, 
the variance of u is 

ol = '^alal ( 1 . 8 ) 


Since x is such a linear combination with 
variance of is r 


^1 = 


i¥2 


<^h = NjJN and Uj^ = Xj^, the 


(1.9) 


where is the variance of Xj^. 

From Sec. 2, Ch. 4, we have (since x^, is based on a simple random 
sample of units from the hth stratum) 


.Q2 

< = ( 1 . 10 ) 

where is given by Eq. 1.3; and by substituting this result in Eq. 1.9 
we obtain Eq. 1.2. 

c. Since Xj^ is the mean of a first characteristic of the units included 
in a sample from the hih stratum, and is the mean of a second charac¬ 
teristic for the same sampled units, then, from Sec. 3, Ch. 4, the co- 
variance of Xy^ and is 

= ( 1 . 11 ) 


and, since the samples are selected independently in the respective strata, 
the covariances = 0 for different from /c. It follows from the 
corollary to Theorem 12, Ch. 3 (p. 5B), that since x is a linear combination 
of the random variables Xj^, and y is the same linear combination of the 
random variables y^^, ^ ^ 

( 1 * 12 ) 


and, substituting for (Eq, 1.11) in Eq. 1.12, we obtain Eq. 1.4. 







124 STRATIFIED SIMPLE RANDOM SAMPLING Ch. 5 

The proof in Part b is seen to be a special case of this result, obtained 
by substituting x for y. 

Remark 1. When a proportionate sample is selected, i.e., = njN 

f \ the covariance becomes 

^ _ Im/ (113) 


and the variance becomes 


n| == - — SI, with S% 


INA 

N 


Jj ^ 

Remark 2. For estimated totals a:' = and y’ = the results 

h h 

corresponding to those given in {a), {b), and (c) become 


Ex' 

% 

il 

A 

II 

Ox'y' 

= 

uT 

- AMI 


r^x'y' ^^y. 

Px'y’ 

G^c'Oy' G^Oy 


2. The variance of the ratio estimate (Vol. I, Ch. 5, Sec. 4). Jo prove: 
The variance of r = xjy Is approximately 

(2-lorl-5.4.5) 

where 

^\z' “ ^\x A R“S\y '^J^PhXY^hX^hY 1-5.4.6) 

and where Six and Sfy PhXY ^ -^nxYl^hx^^hY 

the correlation between and in the hih stratum, Sf^xY is given by 
Eq. 1.5, and R = X/Y = X/ Y The other terms are defined in Sec. 1. 
Proof. From Eq. ll.l, Ch. 4, 

al ^ R^( n XVI- 2p,,V,V,) (2.3) 

where = afX and — (^fY. Then 

Q . / i 4'Ar2 i .02 1_ 1 _ VAra 1- Ih 




Stx + fffYi J" 


" 2 hi xf PKXySnxS.y 


( 2 . 4 ) 




Sec. 3 VARIANCE OF RATIO ESTIMATE 125 

which follows from Sec. 1, above. Assembling terms, we have 

nl — ^^Phxr^hx^hr) (2.5) 

which is equivalent to Eq. 2.1. 

Exercise, Show that Slz/ is the variance of == — RYf^^, i.e., 


81 


^kZ' 


I(Z^n~Zkf 

N,-\ 


3. The variance of the ratio estimate based on the weighted average of 
ratios of random variables (Vol. I, Ch. 5, Sec. 4). To prove: The variance 
of the estimate ^ 

/ - 

Y 

is given approximately by 

]Y 2 f 2 ~~ (3.1 or 1-5.4.12) 

where 

'S'fz = six + RlSlr - 2RnPkxrS^xSkr (3.2 or 1-5.4.13) 

the terms in Sf,^ are defined in Sec. 1 of this chapter, and r^ = xjy^, the 
ratio of the sample aggregates in the Ath stratum. 

Proof. It is seen from Sec. 13, Ch. 4, that the variance of r, is given 

approximately by Consequently, by Eq. 1.7 and 1.8 with 


Y, Uf, = the variance of 


L 


Y 


is 


I ^ 


I_ fh ^hZ 

yt: 


and, since Yj, == iV,f, and NY, we have Eq. 3.1. 


Exercise. Show that Slz is the varianc^e of i.e., 

m ZfN 

1(Xh~R,Y,,N 

Nn~ 1 







Ch. 5 


126 STRATIFIED SIMPLE RANDOM SAMPLING 


4. Comparison of biases of two ratio estimates, using proportionate 
stratified sampling (Vol. I, Ch. 5, Sec. 4). To prove: If Aj and are 
the approximations to the biases of 


r 






and r' — —■ 


Y 


h 


respectively, and if of and are the approximations to the variances, 
then for proportionate stratified sampling: 

a. Both A^/crf and decrease for increasing size of sample provided 
the number of strata, L, is held constant. 

b. If the size of sample is increased by increasing the number of strata 
(and if the total size of sample is small relative to the population), then 
Al/af decreases, but Al/af. may increase with increasing sample size. 

c. If the number of strata is L, and if the Xj, and y\ do not vary widely 
between strata, then A 2 is of the order of L times as large as A^ 

d. It follows from Parts a and b that a sufficient condition for A^ to be 
small relative to is that the total sample be large, no matter how 
small the average size of sample per stratum. It will be zero if 

a sufficient condition for A^ to be small with any 
size of sample in a stratum is that pj^xr = ^hxl^hv stratum 

(which is the condition for the regression line for X on Y to go through 
the origin for each stratum). The bias of / can be small for small samples 
per stratum under less stringent conditions, but unless the conditions 
given are approximately met there is a risk of serious bias if r' is used. 

Proof. It follows directly from Sec. 14, Ch. 4, that the bias of the ratio 
estimate r is approximately 

A, = R(F^,-p,yF..K„-) (4.n 


and the bias of r' is approximately 

Aa = ^ I YuUVl, - (4.2) 


FL = 


R 


^ hX^ 

nh 


1 

L 





~ Vh 

X 

r> 



-Six 


7 ’ 


where 



Sec. 4 BIASES OF TWO RATIO ESTIMATES 127 

and the other terms are as defined in Sec. 1, 2, and 3. For proportionate 
stratified sampling Eq. 4.1 and 4.2 become, respectively, 

~ P«XY KxKy) (4.3) 


1 ^. i^>,n(y!y~^p,.xyy,xy.y) 

n ,Yf 2 


n 

~ y 2 E*R,, (F|y — pj,xY V,^x V^y) 

and, since nJN^ =/, =/, = W, = N^X^, we have 

1 —- 

^2 = 2E*(F|r - PMr Efti: F/,k) 


(4.4) 

(4.5) 


(4.6) 


^ - £1 X^hiVkY — PhXYVhxVhv) 


YL 


n 




(4.7) 

(4.8) 


The values of A and A' are constant for any fixed set of strata. 

From Eq. 4.5 and 4.8, and substituting values for af and (r4 for pro¬ 
portionate stratified sampling obtained in Sec, 2 and 3, we have 


IMM 

|i 


\~f 

- -A'^N 

n 


1 -flNMz’ 



n N 





af. 


iNMz 


n N 



both of which decrease with increasing size of n if T, A, and A' are fixed. 
Thus, with a fixed set of strata and for samples large enough, the approxi¬ 
mate bias of either r or / will be trivial relative to the approximate 
standard deviation. 






128 


STRATIFIED SIMPLE RANDOM SAMPLING 


Ch. 5 


Note that, although and A'^Nj^l^ depend on the 

stratification, they are ratios of averages that may not be sensitive to 
altering the stratification, and sometimes may be affected but slightly as 

the number of strata is increased. l „„ 

We shall consider the case where A^Nj^N^Slz- and A'^Nj^N^t-,^z are 
about constant as the number of strata is increased. Suppose, now, that 
the size of sample is increased by keeping the same average sample tak^ 
per stratum, but increasing the number of strata, so that njL - n - 
constant. This does not affect Eq. 4.9, which still decreases with increas¬ 
ing n, but now Eq. 4.10 becomes 


11(1 

i?^N^8lz 


(4.11) 


which increases with increasing n provided / is small for all sizes of 

sample considered. p 

Also, if the approximate biases are not zero and if we take the ratio ot 

As to Ai, we have ^ (4.12) 

Ai ” A 

which will be approximately equal to L when A' = A, which latter con¬ 
ditions will hold approximately (from Eq. 4.4 and 4.7) at least when the 
X the Y. and the do not differ widely between strata. 

Finally, it follows from Eq. 4.9 that Af will be small relative to the 
variance with a large enough sample (for any number of strata), but from 
Eq 4 7 it is seen, with « fixed, and with p^xr ^ t^at /ia neea 

not decrease as «increases unless the sampling fraction becomes large; but 
a sufficient condition for Aj to be zero or small is that p^xr - ^hxl>'hr 

^*^lt^appears reasonable to assume that relationships essentially similar to 
those given above may hold in practice for disproportionate sampling 
designs, although the exact relationships have not been developed. 

5. Diflference between the variances of two ratio estimates (Vol. I, Ch. 5, 
Sec" 4). It is shown below that the difference between the variance ot 


yNM, 


and the variance of 


L 



COMPONENTS OF VARIANCE 


129 


Sec, 6 
is given by 

= A I ^ [VURn - R)^ 

£ h rif^ 

- 2i?,.(Kfy. - PnxYVnxynyXRA- R)] (5.1) 

Note that the first term in brackets in Eq. 5.1 is positive and will increase 
as the variation in the increases, and the second term involves the 
approximations to the bias of the and will be small when the approxi¬ 
mate biases are small, i.e., when the = ynxIKr- This is an 

indication that the larger the difference among the stratum ratios, the 
greater the gain in using r' rather than r, so long as the second term is 
small. However, if the second term is large relative to the first, then the 
use of r' should be avoided. 

Derivation of the difference is as follows: 

The variance of r is given by Eq. 2.1. The variance of / is given by 
Eq. 5.1. The difference is ^ 

- 81^) (5.2) 

Now * 

'S'lz- - == Sly, -h J^Sly - iRp^yyyyS^yyS^Y 

— i^lx + Rl^lr - '^RhPi.xrRhx^hr) 

~ R/iriR^ ~ Rf^ — ^Piixr^hx^kriR ~ Rh) 

== YlVl,.(R„-Rf 

- 2 YlR^iVlr - ynxV„Y)(Ru - R) (5.3) 

Substituting Eq. 5.3 into Eq. 5.2 and substituting Y= A^fand Y, == IV Y 
we obtain Eq. 5.1. '' *’ 


6. Total population variance expressed as a sum of components (Vol. I, 
Ch. 5, Eq. 5.6). To prove: For a stratified sampling design the variance 
between elementary units in the population, can be written as the sum 
of the variance between the stratum means, af, and the variance between 
elementary units within strata, af,,, i.e., 

^ trf -f al (6.1 or I-5.5.6) 

where 


N 


( 6 . 2 ) 






130 


STRATIFIED SIMPLE RANDOM SAMPLING Ch. 5 






1N,{X,-Xf 

N 

L Nh 

N 


(6.3 or I-5.5.4) 
(6.4 or 1-5.5.7) 


Proof. 

Na^ = I = 2 2{(^w - ~~ 

= x„f + 212(X,,-X,)iX,-X) + lliX,-Xf 

and, since |(A'«- X„) = 0 for each stratum, and since 


we have 


II’a- rr --f)’ 

k<4:iW . + „l 

^ N ^ 


7. Gain due to stratification using proportionate sampling (\ol. I, Ch. 5, 
Eq.' 5.3). To prove: The absolute gain due to stratification with pro- 
portionate sampling is 

where erf is given by Eq. 6.3, and 


\ L M Nk 


(7.2) 


where 0 % represents the variance of x for a simple random sample, and 
cy? represents the variance of x based on a proportionate stratified sample. 
^'Proof. The variance of a sample mean based on a simple random 
sample of n elements is (from Ch. 4, Sec. 2) 


8^ 1-/ N 2 


n 


n N- 1 


1-/ N 
n N- 1 


(o'! + (^1) 


where erf is given by Eq. 6.3 and 0 % is given by Eq. 6.4, 

The variance of a sample mean based on a proportionate stratified 
sample of n elements is (from Sec. 1, Remark 1) 

8 ^ 


(7.4 or 1-5.3.10) 



Sec. 8 VARIANCE BETWEEN STRATUM MEANS 13] 

Therefore, 

^1 - < = - SI) (7,5) 

Assume that N is large enough that the approximation N/(JV~- 1) 1 is 

good, and the N,, are either so large, or so nearly constant, that the 
approximation N,/(N, ~ 1) = Jv/(JV- 1) is satisfactory. Then 



8. Variance between the stratum means with random grouping of elements 
into strata (VoL I, Ch. 5, Sec. 5). To prove: If strata are formed by a 
random grouping of elements, ^e variance, between the stratum means 
IS approximately equal to SUN. 

Proof. If the Nj, elements in the hth stratum were obtained by dis¬ 
tributing all N elements of the population at random among the L strata 

so that Nj, elements were put into the Mh stratum, /? = 1 • • • Z then 
EXj, - V, and , , , i icii 




N 




iNnEXl 

N 


X^ 




NN, 




N 


- X^ 


|S2 ^ - Eji L 

Z _ 

n^n 


N 


N L 


= if L is large. 




132 STRATIFIED SIMPLE RANDOM SAMPLING Ch. 5 

Also, from Sec. 4, Ch. 4, ESI = where SI is given by Eq. 1.3. provided 
the strata are made up by random assignment of the units in the popula¬ 
tion, and therefore ESlj = S^, where SI, is given by Eq. 7.2, or S^ 
for a reasonably large population. 

It follows that al SlJN provided there are enough strata that 

L/a-i)-i. 


9. Optimum allocation to strata (Vol. T, Ch. 5, 
The values of which minimize the variance 

Eq. 8.1). To prove: 

1 ^ \ — f 

-9. ^ Vi\r2 7ft 02 

(9.1) 



subject to the condition 


L L 

= n or ^ = 0 

(9,2) 

are 


% = 1 -« 

(9.3 or 1-5.8.1) 

InA 



where is given by S,„ 8„pr.l Y, or 8^zl r depending on the form of the 

estimate. 

Equation 9.1 is the variance of .F (Eq. 1.1) when Sf = Si as defined by 
Eq. 1.2; it is the variance of 

when SI = Sf^^j and Sl^^ is defined by Eq. 2.2; and it is the variance 

of A - I V/t/ when SI - SfJ and Sl^ is defined by Eq. 3.2. The 
form of the variance, and thus of the optimum values of is the same 
for totals estimated by multiplying the above estimates by a constant. 

Proof. To obtain the minimum subject to a fixed size of sample we 
set up the Lagrangian Ff i.e., we define the function 

E = Co + + • • • + AA (9-^) 

where F^ is the function to be minimized, is the relationship determined 
by a first condition to be imposed, F^ is the relationship determined by a 


* For a discussion of the Lagrange method of obtaining a relative minimum 
or maximum value of a function subject to conditions see E. Goursat and 
E. R. Hedrick, Mathematical Analysis, Vol. I, Ginn and Co., 1940, p. 128, 
Sec. 61, or other texts. 


Sec. 9 OPTIMUM ALLOCATION TO STRATA 133 

second condition to be imposed, etc.; and and 4, etc., are Lagrange 
multipliers whose values are obtained as a part of the solution. 

In the present problem Fq is the variance to be minimized and is given 

by Eq. 9.1; — n is the condition to be imposed (from Eq 9.2). 

There are no other conditions. Consequently, in this particular problem 
we have 


F = 



+ — n) 


(9.5) 


We ascertain the optimum values of the by taking derivatives of F 
with respect to the and setting each of the derivatives equal to zero. 
This gives L equations in T + 1 unknowns (the and A). The con¬ 
dition given by Eq. 9.2 makes L q- 1 equations. With as many equations 
as unknowns we can solve them simultaneously to obtain the values for 
the Uf, that minimize (or maximize) F^ subject to the condition F^. One 
can examine the results in cases where there is any doubt to ascertain 
whether the solution gives a minimum or a maximum value for F^y. Thus, 


or 




NISI 

mnl 


+ A 


0 (/!=- L • • •,£) 


nVx 


(9.6) 


Summing over the L equations, we have 


4 1 iNnS, 

Vx N 


and from Eq. 9.2 we substitute n for and solve for VA. By substi¬ 
tuting this value for VA in Eq. 9.6, we obtain 




NnSn 




(9.3) 


To see that the substitution of given by Eq. 9.3 yields a minimum 
we note that the variance 

A'2CT® = 2iV|(l-—) — 

nJ n. 






134 STRATIFIED SIMPLE RANDOM SAMPLING Ch. 5 


can be written as 


= - INiM + In 


L IN-J.V 


L INJ: 


== - INA + In, I - x) V 2KiNA ' (9-7) 

where isT is a constant. The second term of the right-hand member of 
Eq. 9.7 is the only term involving the n^/s. Therefore, will be at a 
minimum when this term is zero, i.e., when K = Nj^SJuj, or 


and, summing both sides for // — 1, 2, • • *, T, 


L 

== n 


InA 


INjA 


When this value of K is substituted into Eq. 9.8, we obtain Eq. 9.3, the 
values of which make a minimum. 

The variance at the optimum is given by 


(opt.) - ”, 


.2 ^ - i- 


I n 


mM] (9.9 or I-5.8.5) 


xx/tnV’b is obtained bv substituting Eq. 9.3 in Eq. 9.1 


10. Gain of optimum allocation over proportionate stratified sampling 

(Vol. I, Ch. 5, Eq. 8.6 and 8.7). To prove: The relative gain over pro¬ 
portionate stratified sampling of optimum allocation of a fixed size of 
sample to the strata is given by 

(t 2 (p rop.) - (opt.) ^ y%,. (10.1 or 1-5.8.6) 

(prop.) (1 ~/)(l + 

— ^ 

1 + 


( 10 . 2 ) 


Sec. 11 OPTIMUM ALLOCATION WITH VARIABLE COSTS 135 

where 


1 ^ 




El = 


s = 


N 

fl 

N 


InA 

N 




Proof. We have already seen that 


cr2 (prop.) 


cr^ (opt.) = 




n N Nn 

&Af InA 




mn 


M2 


Then, it follows that 




(prop.) — (7^ (opt.) Nn 


(prop.) 


N-n2N,Sl 

Nn N 




(1 ~f)iN,SfJN 


Substituting for f^Nf^SfJN and dividing numerator and denomi¬ 

nator by S\ we get Eq. 10.1. 


11. Optimum allocation with variable costs between strata (Vol. I, 
Ch. 5, Eq. 12.1, 12.2, and 12.3). To prove: The optimum allocation to 
strata of a sample of elements when the cost of including a unit in the 
sample varies from stratum to stratum is 

n^ = - - ^1 or 1-5.12.1) 

mAlVc,) 

where n, the total size of sample, is determined to yield (1) a minimum 
variance when the total expenditure is fixed, or (2) a minimum expenditure 
when the precision is specified at e^. 




136 STRATIFIED SIMPLE R ANDOM SAMPLING Ch. 5 


For case 1 


"or case 2 


_c__ 


y(Nj,.Vc,) mAl^c,) 

+ inm 


(U.2 or 1-5.12.2) 


(11.3 or 1-5.12,3) 


Proof. The variance of an estimated average or ratio from a stratified 
random sample is (see Eq. 9.1) 

J- - L i«i - A) f (f. - w) 


The cost of the survey is assumed to be of the form 




(1) To determine values of which minimize the variance subject to 
a fixed total cost, C, we set up the Lagrangian F: 

F = i NfM + ’dcunu - C) 


+ AQ = 0 
dn, nt ^ 


Solving for n^, we have 






Substituting for V X, we have 


nAjVc,, 


\sjVc^) 


Jj 

When the total cost is fixed at C -- we have 





Sec. 12 SAMPLE ESTIMATES OF POPULATION VARIANCES 137 

and the optimum n is given by £q. 11.2. 

(2) To determine the ,>aiues of that minimize the cost subject to a 

prescribed precision, we set up the Lagiangian F: 






Proceeamg as m (1) above, we tind that the optimum values for the n 

are, again, as given by Eq. 11.1. ^ 

When the precision is fixed at 


A2 


we have, by substituting for n„ its optimum value, 

= I /_ I ^ 

\ I ^ 

and the optimum n is given by Eq. 11.3. 

12. Sample estimates of populatioii variaiices (Vol. I Ch 4 Sec' 141 
To prove: » • » 

a. A consistent and unbiased estimate of as given by Eq. 1.3 is 


„2 

^nx -;— 

1 


(12.1 or 1-5.14.1) 


b. A consistent estimate of as given by Eq. 2.2 is 


4z’ — + rslx — ^rs^xr (12.2 or 1-5.14.2) 


where six and sly are given by Eq. 12.1, and 




— Vti) 


1 


(12.3) 


c. A consistent estimate of as given by Eq. 3.2 is 

= six + rlsir - 2r^s^xr (12.4 or 1-5.14.3) 


(12.5) 




138 STRATIFIED SIMPLE RANDOM SAMPLING Ch. 5 

where sfx is given by Eq. 12.1 and n = n/X. Note that for large « the 

second term of Eq. 12.5 can be neglected. 

e. An unbiased estimate of a\ as given by Eq. 6.2 based on a propor¬ 
tionate stratified sample is 

( 12 . 6 ) 


1 L Wft 

n h i 


^wX 

n 


where 


1 ^ 

h 


Note that for large « the second term of Eq. 12.6 can be neglected. 

Proof. The proofs are left to the reader. 

Hint- The proofs for Parts a, b, and c follow the same reasoning as 
given in Sec. 20 and 21 of Ch. 4. The proof for Part d follows from 
noting that ^ 

E -xf = E IpM - nEx^ 

h ^ 

h 

and the proof for Part e follows from noting that 

h i h h I' 

13 . Variance for stratification after sampling (Vol. I, Ch. 5 Sec. 16, 
Remark 3). Suppose that we have a simple random sample of n units^ 
Classify the units of the sample into X classes. Let be t e ° 

the ith unit in the tth class. Let N, be the known number of umts in the 
Mh class in the population, and let n„ be the number of units in the h 

class in the sample. 

Construct the estimate 


1 L 

N h i 


(13.1) 


It is easy to show that » is an unbiased estimate of the population mean 
if we exclude all samples in which one or more of the is zero. 

The variance of ir may be obtained in the following way. First, we 

note that _ 

ct| = E{x - Xf = X{X„.(a: - X)^} 

where denotes the expected value for a fixed set of values , n^, , 



Sec. 14 DUPLICATION OF A SUBSET 139 

Ul' But — Xf is then the variance of a sample mean based on a 
stratified sample with Uj, elements from stratum h. Therefore, 


E,Sx-Xf = 


1 ^ N 






Nr. 




To find the expected value of this expression, we need to evaluate E(l/nf.) 
This can be evaluated exactly,* but the exact value has a complicated 
expression. It can be shown, however, that 



_1_ 


^-fN~N, \ 
n N, I 


This follows from Eq. 17.8 and 17.9 of Ch. 4 with P 
Q = I — P. Hence, 


NJN and 


al==E{E,,Xx-Xf} 


f^h n ^ fN^ 

and, by substituting Si = HN.SfJN, /= n/N, and Q, = (N~- N,)/N, 


n \ 




i man 


(13.2 or 1-5.16.1) 


We note that the first term is precisely the variance for a proportionate 
stratified sample selected from the strata defined earlier, and that the 
second term is positive. Hence, the variance for stratification after 
sampling is larger than the variance for proportionate stratified sampling. 

We note further that for a sufficiently large average sample size per 
stratum, n, the second term will be small compared with the first term, 
since the second term is of the order of Ijn as large as the first. Unless 
n is sufficiently large, however, the net effect of such an approach by 
stratification after sampling” may substantially increase the variance 
over the usual estimate with simple random sampling. 


14. Increase in variance arising from duplication of a subset of elements 

(Vol. I, Ch. 5, Sec. 16, Remark 4). To prove: Suppose that from a random 


* Frederick F. Stephan, “The Expected Value and the Variance of the Recip- 
rocal and Other Negative Powers of a Positive Bernoullian Variate ” Annals 
Math. Stat., XVI (1945). 








140 STRATIFIED SIMPLE RANDOM SAMPLING Ch. 5 

sample of n elements we select a random subsample of elements, dupli¬ 
cate these elements, and add them to the original sample. Then the 
mean based on the n + elements is an unbiased estimate oi the 
population mean, and its variance is greater than the variance of the mean 
based on the original n elements by the approximate factor 


1+3 


n 




Proof. Let the elements in the random subsample have the values 
X., and the remaining elements in the sample have the values 

' * % xj The estimate of the mean based on the sample of n + 
elements is then 


Now, 


Ex' 


n + 

= X 

so that x' is unbiased. Also 


n n X rii 

{lEu + Ev) 

[In^X X (n- n^)X] 


, say. 


1 


n + 
i 


Clearly, 

and 

It is readily shown that 



1 

or|. = 

{n + 


N — 

-- n.i 

N '■ 



<y; == 

N 

that 



(4cr| + al + 4(r,,) 


(n - n^S\ 


^U'O 


nfn — n{) 


SI 


N 




Sec. 14 

Hence, we have 


DUPLICATION OF A SUBSET 


141 


(n -f V /\r 


j — I 


For the original sample of n, the variance of the mean i 


IS 


o N-nS%- 
N n 


so that 


N / n Y( n,\ 

If N is large compared to n, we have 


n 


N~~n 


4 


1 -j~ 3^ 

n 


0 % — gl 


+ ?) 

The relative loss in efficiency is thus 

(* + ?) 

This has its maximum value (for 0 < njn < 1) when njn = and for 
this value of njn, the loss in efficiency is .125. 

REFERENCES 

(1) Tore Dalenius and Margaret Gurney, “The Problem of Optimum Stratifi- 

cation, 11 , Skondinavisk Aktuarietidskrift, 1951, pp. 133-148. 

(2) L. R. Frankei and J. S. Stock, ‘‘The Allocation of Samplings among Several 
Strata,” Annals Math. Stat., !0 (1939), 288-293. 

(3) “On the Two Different Aspects of the Representative Method: 
Ihe Method of Stratified Sampling and the Method of Purposive Selection ” 
T J^oy, Stat Soc., J09 (1934), 558-606. 

Y' “contributions to the Theory of the Representative 

Memod, /. Boy. Stat. Soc. Supplement, 1935, pp. 253-268. 

(5) A. A. Tschuprow, “On the Mathematical Expectation of the Moments of 
pequency Distributions in the Case of Correlated Observations,” Metron 
2 (1923). 646-680. ’ 

Techniques, .John Wiley & Sons, New York 
1953, Chapter 5. ’ 

(7) W. Edwards Darning, Some Theory of Sampling, John Wiley Sc Sons, New 
York, 1950, Chapter 6. 



CHAPTER 6 


Simple One- or More Stage 
Cluster Sampling 


DERIVATIONS, PROOFS, AND SOME EXTENSIONS OF 
THEORY FOR CH. 6 OF VOL. I* 

Note. A simple cluster sampling plan is a sampling plan in which {a) the 
elementary units of the population to be sampled are grouped into clusters, 
such that each elementary unit is associated with one and only one cluster; 
and (b) a sample is drawn by using the clusters as sampling units and selecting 
a simple random sample of the clusters. The clusters are referred to as 
primary sampling units (psu’s) or as first-stage sampling units. 

If all elementary units in the selected clusters are included in the sample the 
sampling plan is a one-stage sampling plan. If a subsample is selected from 
each of the selected psu’s, with a uniform fraction of the second-stage sampling 
units selected from each primary unit included in the sample, the sampling plan 
is referred to as a simple two-stage cluster sampling plan. Additional stages ot 

sampling can be introduced. . . , v ^ • 

In this chapter we develop the theory for these simple cluster sampling designs 
and for certain extensions of them. Attention is given to optimum sample 
design with different cost functions. Measures of homogeneity are defined, 
and their effect on optimum two-stage sample design is considered. 

For simplification in some proofs sampling with replacement is assumed. 
The results obtained are to be regarded as approximations to those for sam¬ 
pling without replacement in cases where the sampling fractions involved are 
not too large. 

Some notation used in this chapter. The notation in this chapter is an 
extension of that in Chapter 4, with the listing units (second-stage units) 
in this chapter corresponding to the sampling units in Chapter 4. Thus, 
N in this chapter is the total number of second-stage units, and is the 
number in the /th primary unit. The N units are here regarded as grouped 


* Appropriate references to Vol. I are shown in parentheses after section or 
subsection headings. The number following I- after some equations gives 
the chapter, section, and number of that particular equation m Vol. 1. 

142 



1 NOTATION 

into M clusters that serve as primary sampling units, 
of this chapter is as follows: 


143 

The basic notation 


M ■ Number of first-stage units (or primary sampling units) in the 
population. 

w = Number of first-stage units in the sample. 

Ni == Number of second-stage units (or listing units) in /th first-stage 
unit in the population. 

- Number of second-stage units in the sample from /th first-stage 
unit in the sample. ^ 

^ij = Value of ^-characteristic for yth second-stage unit in /th first- 
stage unit, /=!,..., M, and y == 1, • . 

% — Value of ^-characteristic foryth second-stage unit in the sample 
from dh first-stage unit in the sample, / = 1, • • m, and / =- 1 

• • s 


and are defined similarly for a second characteristic. 
Sums are indicated by dropping subscripts. 

M M Ni 




'i 0 
m n,- 


0 

M 


= 2 2 *^ 


^ i 


n = 2r>i 

^ i 

Average values per second-stage unit are 


V-Z/A, V,-A,/A, 
i - xjn, Xi = xjn, 
Average values per first-stage unit are 

X = XjM, X — xjm 
N == NjM^ a ~ njm 

R = XIY, R,^XJY, 


Ratios are 





144 SIMPLE CLUSTER SAMPLING 

Simple unbiased estimates of totals are x' and x^, where 

x' is the estimate of X \ see Sec. 1 for further 

cr'. is the estimate of x] definitions of these estimates. 


Ch. 6 


Furthermore 

and when is constant for all i 

arid 

f = /l/2 = 

1 The variance and covariance for a two-stage and for a multi-stage 
sampling design (Vol. L Ch. 6,,Sec. 6). a. Variance of smple unbiased 
estimate for two-stage sample. 'To prove: Let us first 
case of a population consisting of M first-stage units with A, secona-sUge 
units within the ith first-stage unit. Assume that a simple random 
sample of m first-stage units is selected, and a simple random sample of 
rti second-stage units is selected from the ith selected first-stage unit. 
Suppose that the estimate made from the sample is 

( 1 . 1 ) 


M ” , 
~ Z*i 
m i 


M “ Ni ”■ 

— 2 — 2 *« 
m i n. j 


where is the value for the;th selected second-stage unit from the ah 

selected first-stage unit. We shall show beiow that Ex = A, where 


M N, 

1 2^11 

i j 


and A., is the value of the;th second-stage unit of the ith first-stage unit 
in the ’population. We shall also show that the variance of a; is 


0% 


M—m 
m 


M 


six 


— 2 - Id - 

m i n, Xi 


where 


‘S!x = 


M 

2(^1 


X)^ 


M- 1 


( 1 . 2 ) 


(1.3) 


S\ix 


i _ 


x,y 


A.. - 1 


(1.4) 




Sec. 1 
and 


VARIANCE AND COVARIANCE 


N, 


M 

2 ^,: 


X 


3 MM 


Proof. We can write 


m T " 


where 


N. 

r.. - J 


" V, 


N, 


X,- = N.-x.. 


Then, by Theorem 6 of Ch. 3 (p. 49), 


145 

(1.5) 


Ex’ = — jEXi 

m i' 


Now Ex'^,hy Theorem 14 of Ch. 3 (p. 61), is equal to E{Ex'^b* = B,), 
where E(Xi\b* = B,) means the conditional expected value of*,', knowing 
that the Ith sampled first-stage sampling unit is the yth first-stage unit in 
the population. For short we shall cal! E{x'i\b* == B,) = Hence 


M ^ , M ^ M ^ / N 

Ex jFx, = - 2E{E,x,) = ~2 e{ E, ^ 

i \ Hi 


m 


m 


m 


= 3 2.^-$2>« = -M 


m 


Ki N, ^ 



where and X'^ are to be distinguished from V,,- and V,- defined above 
in that they are random variables and take on different values dependent 
on which first-stage unit is selected in the sample. The values of V,. and 
Xi, are uniquely associated with the fth first-stage sampling unit in the 
population. Finally, from Sec. 1 of Ch. 4, we have 


„ , M ™ , jl/ « 1 Af 

Ex =. — 2EXi ^~2-r^2x, = MX=x 


m 


m 


We now wish to express the variance ofin terms of the two com¬ 
ponents, one being the contribution arising from the first-stage sampling 
and the other from the second-stage sampling. 

By Theorem 15, Ch, 3 (p. 65), with m = x' and h'^ = [1], where the 
expression |[1] is used to indicate a fixed sample of first-stage units we 
may write ~ ' 

al ^ ^4,ri] + 4(^'|[t]) (L 6 ) 

In this case, is the conditional variance of x\ holding the first-stage 
units constant; E{x'\\l]) is the conditional expected value of holding 





146 SIMPLE CLUSTER SAMPLING Ch. 6 

the first-stage units constant; and (yE(x'\[i]) variance of these con¬ 

ditional expected values over all possible samples of first-stage units.* 
Consider the first term in the right-hand member of Eq. 1.6. The 
variance of x' for a fixed set of first-stage units in the sample is the variance 
of {Mlm)x\ where x" is the estimated total for the fixed set of m psu’s, 
with each of the m psu’s now serving as a stratum. Consequently, by 
Sec. 1, Ch. 5, the variance of x' for the fixed set of primary units is 




4/2 ni ]\J. — M. 

rn^ n.N. 


Hence, by Theorem 6, Ch. 3 (p. 49), 


2 _ Ml 
- ^2 


N, - Wy 


N, — n, 


fiiNi 




SI- 


* n,Ni 


Slix 


(1.7) 


since 


= N? 


Nj-Hj 

riiNi 


Sh 


2iX 


is a random variable having M possible values, each with probability 
1/M. Equation 1.7 is equal to the second term in the right-hand member 
of Eq. 1.2 and represents the contribution to the variance due to sampling 
second-stage units within first-stage units. 

Consider now in Eq. 1.6, which represents the first-stage con¬ 

tribution to the variance. Now, 

£(*'1[11) = ^£{(|jVa)i[1]} 

/4^here x^ = i^xju^ is the sample average per second-stage unit from the 

zth first-stage unit in the sample. Since the first-stage units are held 
constant, they can be regarded as strata, and from Sec. 1 of Ch. 5 we have 


Eix^m)--^ 


M ^ - M ^ . 

m i m i 


* Note that Theorem 16 (p. 68) could have been applied in this case, also, 
and with exactly the same steps as with Theorem 15. Theorem 16 has an 
advantage when more than two stages of sampling are involved (as in Sec. 4 
of Ch. 7). 



Sec. 1 VARIANCE AND COVARIANCE 147 

where the primes are used to indicate that f; and A- are random variables 

(values for the /th first-stage unit in the sample). Hence, bv Sec 2 of 
Ch. 4 r J ■ 




Mm M— 1 


which is equal to the first term in the right-hand member of Eq. 1.2 and 
represents the contribution to the variance due to sampling first-stage 
units. Therefore, substituting Eq. 1.7 and 1.8 into Eq. 1.6, we obtain 
Eq. 1.2. 

If the second-stage sampling fractions are uniform, i.e., == f 

— EhjN, for all first-stage units, the variance of x' becomes 


where is given by Eq. 1.3, and 


t M Ni ^ 1 M 

N ? A,. - 1 i ^ AI 


and the rel-variance of x' is given by 


where 


Mm ^ Nmn ^ 




( 1 . 10 ) 


(1.11 or 
I-6.6.4) 


and where, in Eq. 1.9 and 1.11, n is the expected number of listing units 
in the sample per psu in the sample. 

b. Covariance of simple unbiased estimate for two-stage sample. To 
prove: The covariance of x' and f for a two-stage sampling design is 


Eix' - X)(f - Y) 

m M m i /?, 

where x and f are defined by Eq. 1.1, and where 


( 1 . 12 ) 


2(^,~x)(t,- f) 


M- 1 


( 1 . 13 ) 


148 


SIMPLE CLUSTER SAMPLING 


Ch. 6 


X^ya- Yi) 

with Z,> X, and X defined as in Eq. 1.5^ and with Yp Y, and Y 
similarly defined for the 7-characteristic. 

The component of the covariance of x' and y' due to first-stage 
sampling is 

(1.15) 

m M 

and the component of the covariance of x' and y' due to second-stage 
sampling is 

m i tii Nf 

Proof. The proof that the covariance, for a two-stage sampling 
design is given by Eq. 1.12 follows the same steps as used in Part a to 
prove that cr|' is given by Eq. 1.2 and is left to the reader. Note that 
Eq. 1.12 becomes Eq. 1.2 when we substitute values of X for the corres¬ 
ponding values of 7. Thus, by this substitution S-^xx 
equals Six iti Eq. 1.3, and S^,xx in Eq. 1.14 equals in Eq. 1.4. It 
follows that 

As in Eq. 1.9, when uniform sampling fractions are used, the covariance 
becomes _ 

where S^xy is given by Eq. 1.13, 

1 M 

^2XY ~ 2^ i^MXY (1.19) 

and fi in Eq. 1.18 is the expected number of listing units in the sample 
per psu in the sample. 

c. ReJ-variance of a ratio for two-stage sample. To prove: Let the 
estimate of the ratio be 


( 1 . 20 ) 



Sec. 1 VARIANCE AND COVARIANCE 149 

where x' is defined by Eq. 1.1 and is similarly defined for the 7- 
characteristic. Then the rel-variance of r can be written as 


where 


Vi - ±+ — - 2 — 


m M m ^ tii 

SI = Sl^ + _ IRSi^y 


2i 


(E21) 

( 1 . 22 ) 


with Sfx defined by Eq. 1.3, Sfy similarly defined, defined by 

Eq. 1.13, and R = XIY, and where 


^ 2 i — ^2ix + (1.23) 

with defined by Eq. 1.4, S^y similarly defined, and defined 

by Eq. 1.14. 

Proof. By Sec. 1 1 of Ch. 4, we have 


F? = 


0 * 2 , 


X2 


2 

XY 


(1.24) 


where 4 and al are given by £q. 1.2, and is given by Eq. 1.12. 
By making the substitutions indicated above for 4, 4 , and . in 
Eq. 1.24 and combining the first-stage contributions to the variancrand 
covariance for X and Y, we obtain the first term of Eq. 1.21. Similarly, 
combining the second-stage contributions, we obtain the second term of 
Eq. 1.21. 


fV/ien uniform sampling fractions are used, i.e., n./fV,. = EnIN, the 

rel-variance of r becomes 


where 

with 


and 


4 ^ 


M — m 
M m 


N~ n Wf 
N mn 


(1.25 or 1-6.6.10) 


B^ = S% + Bl,-2Byy 


^iXY given by Eq. 1.13, 
4' “ 4tX> = ^YY 
W^= Wl+ W\-2Wxr 


(1.26 orf- 6 . 6 . 11 ) 
(1.27 or I- 6 . 6 . 8 ) 


(1.28) 



150 


SIMPLE CLUSTER SAMPLING 


Ch. 6 


^2XY _ 

XY ~ NXY 


(1.29 or I-6.6.9) 


XY NXY 
S^iXY given by Eq. 1.14, 

W%=^XX. Wl==WyY 

and where in Eq. 1.25, fi is the expected number of listing units in the 
sample per psu in the sample. Equation 1.25 follows directly from 
Eq. 1.21 with njN^ = En/N. 

d. Variance of simple unbiased estimate for multi-stage sample. The 
variance for a simple random sample of m first-stage units selected without 
replacement from M first-stage units can be expressed in terms of the 
contribution to the variance from the first-stage units and the combined 
contribution from all subsequent stages of sampling. 

m _ ^ , 

Let yujm be an unbiased estimate of U, where u^ is an estimate 
obtained from the /th selected first-stage unit of a sample of m first-stage 
units, / = 1, 2, • • % m; and also let 

EiU,= U, 1,2,- • 

where designates a conditional expected value for the /th selected 
primary unit (which is, say, the yth unit in the population of psu’s). 

To prove: 


where 


- m iU 

\)m Mm i 

(1.30) 

1 M 

M i 

(1.31) 

£i(«i - 

(1.32) 


and ^aJJMm represents the contribution to the variance from all 

i 

subsequent stages of sampling. 

Proof By Theorem 15, Ch. 3 (p. 65), 

n| = + <yEivm) (1.33) 

where the second term represents the contribution to the variance from 
first-stage sampling, and the first term represents the contribution from 
all subsequent stages of sampling. From Sec. \a, 

M — m 


(1.34) 


Sec. 2 ESTIMATES OF VARIANCE AND COVARIANCE 151 

Also, since the subsampling is carried out independently in each psu, 
and since the first-stage units can be regarded as m strata when the m 
first-stage units are held constant, it follows that 




m 


1 ^ I w 

1 lEiiUi - E,Utf = — 




From Theorem 6, Ch. 3 (p. 49), we have 


„ o \ m 1 m 1 3/ 

= —2 = — 2 — 2<tL 

^ i m^i Mi ' 


1 M 

-Nfr? 

Mm 7^' 


(1.35) 


Substituting Eq. 1.34 and 1.35 into Eq. 1.33, we have or? as given by 
Eq. 1.30. ^ ^ 


2. Estimates of the total variance and of the total covariance for a multi¬ 
stage sampling design where the first-stage units are selected with simple 
random sampling (Vol. I, Ch. 6, Sec. 7, Remark). Assume that it is 
desired to estimate the total variance and covariance of unbiased 
estimates from a multi-stage design without estimating the components of 
the variance and covariance. 

a. The estimates si and given below are unbiased estimates of the 
total variance and covariance as long as there are at least two first-stage 
units in the sample, selected with replacement. The subsequent stages of 
sampling are not restricted either as to the number of stages of sampling 
or the method of sampling, so long as the w, are unbiased estimates of U, 
where w, is an estimate made from the /th psu in the sample, and the 
subsampling in any psu is independent of that in any other psu. 

Let Wi, ^ 2 , • • •, be m unbiased estimates of V made from each of 
m independently selected psu’s. 

To prove: 


is an unbiased estimate of 


w 


2(“i - af 

.2 i_ 

(2.1) 

“ m{m — 1) 

al=^E{u-Uf 

(2.2) 


m 



m 


where 




152 SIMPLE CLUSTER SAMPLING 

Proof. From Corollary I of Theorem 11, Ch. 3 (p. 56), 




m 


where 


«l = - Uf 

From Sec. 4 of Ch. 4, we have 


Esl 


Ch. 6 

(2.3) 

(2.4) 

(2.5) 

b. Assume that u, and « are unbiased estimates of V, w, and w are 

unbiased estimates of W, i — 2, ■ • *, and m of the M primary 

units are independently selected by simple random sampling with 
replacement. 

To prove: f (;<,■ - «)(w.- - w) 

" m{m - 1) 

is an unbiased estimate of 

W) 

The proof is left to the reader. 

c. Assume that a simple random sample of m first-stage units is selected 
without replacement. 


( 2 . 6 ) 


To prove: 


Esl 


M 




M 

24 


M— 1 m 


+ - 


Fc = 


M (^uw 
M — \ m 


Mm 

M 

_i_ 

Mm 


(2.7) 


( 2 . 8 ) 


The proof is left to the reader. As the following hint indicates, the only 
complication is the need for using finite sampling corrections. 

Hint: Use the following equations: 


Esl = E 




mint 


1) m(m — 1) 

Eu^ = ffi + e'* 




4 


M — m. 
(M — \)m. 


a^j + 


M 

K 

_i _ 

Mm 





Sec. 3 


ESllMATES OF VARIANCE 


153 


Note that Es^ is not an unbiased estimate of < 7 | given by Eq, 1 30 but the 
bias will be small if mjM is small 

d. Assumetoat a simple random sample of m lirst-stage units is selected, 

and let u = 2Uilm. Assume, further, that a simple random sample of 
rn units is selected from the m lirst-stage units otigiually selected. Let 


u = 


rn' 

lib 

rn! 


and s,-f ~ ^=E-- -E 

m{m' — 1) 


To prove; If the m and m' units are selected without replacement. 


= £4 = ai 


where is given by Eq. 2.1 and d| is given by Eq. 2.7. 
Troof. The proof follows from the fact that 

m' yyj, 


(2.9) 


m 


m — 1 


(Voi. t‘ch‘e 

a Let be given by Eq. 1.1; then, if M is large relative to m, a con¬ 
sistent estimate of 0 %. (Eq. 1.2) is 




where 


m{m' — 1 ) 


(3.1) 


j. N- >“ 

= 2x,, = Mx: 


^ — 


mi 

rn' 


(3.2) 


and m' is the number of first-stage units selected by simple random 
Lfined”fn*Lr*™^*^ variance. The remaining terms in Eq. 3.2 are 

To prove: Esp — cr|., when M is large relative to m 

sh^wnlhat^*’" 2.9, where it has been 

in 


£y:-; 


M— I m 


+ 


Mm 


(3.3) 




154 

where 


SIMPLE CLUSTER SAMPLING 


Ch. 6 


,2 _ 




l(Ui-Uf 

M 


and 

Now let 

Then 

since 


Ui = and u — A 
U, = MX, 


= Mx'^ and Efit^ = X,- 
and U = MX ='EXt. Also, 

m' 


= /? 


m(ni' — 1) 

Remark 1. It will be convenient for use in subsequent sections to let 


(3,4) 


and 


2 ^2 __ ~ 


m' 

X = 

m 


(3.5) 


From U, = MX. it follows that 


M 




M 


(3.6) 


Hence, MalUM- 1) = where Six is given by Eq. 1.3. 

From we have 

< = Elu, - E,uif = M^Eix\ - X,f 

where 8\^x is given by Eq. 1.4. Substituting Eq. 3.6 and 3.7 in Eq. 3.3, 
we have 


(3.7) 




M 


Es’i = £ 5 ;? = ^ ^ I A| -SI. . (3.8) 


m m ■ N.n^ 


Note that Es^ approaches a% as given by Eq. 1.2 when M is large relative 
to m. The second term of Eq. 3.8 is the same as the second term of 



Sec. 3 


ESTIMATES OF VARIANCE 


155 


Eq. 1.2. The ratio of the first term of Eq. 3.8 to that of Eq. 1.2 is 
MUM- m), which is near 1 for M large relative to m. Then, for M 
large relative to m, 



(3.9) 


is a consistent estimate of (Eq. 1.11), where V? is given by Eq. 3.1 
and x' by Eq. 1.1. 


Remark 2. The proof that is a consistent estimate of al', with M large 
relative to rn follows immediately from Sec. 2^/, when it is recognized that 
having M large relative to m is equivalent to sampling the m units with 
replacement. Thus, it was shown in Sec. la that 


If we let 

m' 

2(w^ - uf 

m. = E-V , ,, = <r| 

m{m — 1 ) 

and 

^"1 sT 

i! 

aT 

then it follows that 

^ ~ Z Z^w = 

m r ttif 


Es^ ^ al 


b. When the sampling fractions within first-stage units are uniform for 
all first-stage units and m' = m, the estimate v% (Eq. 3.9) becomes 

m 

m{m— l)x^ mx^ (3.10) 

where 


„2 2 (*.- - 
^cX — ' Z 

m~ 1 

This can readily be verified by making the following substitutions which 
hold in this special case: 

/i/2=/ 


/1/2 


, X _ X 

J\h ^ 


(3.11) 






156 SIMPLE CLUSTER SAMPLING Ch. 6 

For this case a more accurate estimate of the rei-variance is 


= (3.12) 

mx^ 

The reader can verify that this estimate is a closer approximation of 

than is 

Note that as given by Eq. 3.5 with m' m and a uniform sub- 
sampling fraction is related to s^x follows: 

Note also that s^x as defined by Eq. 3.10 is referred to as the variance 
between ultimate clusters, and as given by Eq. 3.12 or 3.10 is referred 
to as an ultimate cluster estimate of the rel-variance of x\ The term 
ultimate cluster is used to denote the units in the sample from a primary 
sampling unit. 

c. We shall next consider an estimate of the rei-variance of r = x'jy' 
for a two-stage design. The rei-variance of r is given by 

1/2 ^ I i 

XY 


From Sec. 2b, and with terms defined in Part a above, an unbiased 


estimate of is 


, _ ^ a)(g, -^) _M^ !(/ - x'){y', - y') . 

m(m' - 1) m m'- \ m ' 

Similarly 


- yf - ff __ ^ . 


m{m' — 1) m m' — 1 
Therefore, a consistent estimate of is 


m 


1 


where 


.,2 ^ 4 _ __ 2 = - 

^ x'^ x'y' x'^ m 


= 4 ' + - 2rSxY 


(3.14) 

(3.15) 

(3.16) 


with the i/, 4?, and defined by Eq. 3.1, 3.14, and 3.13, respectively. 

When the sampling fractions within first-stage units are uniform for all 
first-stage units and m' = rn, the estimate above (Eq. 3.15) becomes 

j,2 __ 4.r _l_ 4f _ 2 A-Ji: v 
^ mx^ my^ rtixy 


(3.17) 





Sec. 3 


ESTIMATES OF VARIANCE 


where is defined in Eq. 3.10 above, s^y/mf is similarly defined 

and 

m 


^cXY 

- 4(2/.- - 9) 

(3.18) 

mxy 

m{m — \)xy 

An improved estimate is 



m 

( I flz _ ^^cxt\ 
y^ xTj J 

(3.19) 


which should be used when the over-all sampling fraction, /, is large. 


Remark 3. An estimate of S^, the variance between listing units in the 
population, may be made from a cluster subsample as follows: 


m nt _ 

^2 _ S 

n — 1 


(3.20) 


Although is not an unbiased estimate of its bias will be trivial whenever 
the number of primary units in the sample is large. Thus, when the primary 
units are equal in size, i.e., i\T = iV", r y 


Es^ = S^- 


N 


1 - 


N~ 1 
n- 1 


SL ^ ~ 1 

N M- 1 


dn 


M- 1 Jl 


(3.21) 


and the remainder term will be small for m (and thus N and n) large. 
(The terms in Eq. 3.21 are defined below.) Equation 3.21 follows readilv 
from the fact that ^ 


and 

where 


^2 24 = 




nX‘ 


Ex^ = 0-| q- 


G% 


M ~ m al N ~ n af 


M~ 1 


_____ w 

N — \ mn 


- (Tb -f nl 

M ^ 

_ 2(t, - xY 
" . M 

M Ni ^ 

^ 2 2(4 - X.Y 

Gn) — zyr --— 

MN 

(3 _ - 0^7^ 

(N~ \)a^lN 


N 


N- 1 




M Ni 

1 liNi - xf 

N 





158 SIMPLE CLUSTER SAMPLING Ch. 6 

4. Estimates of the components of the rel-variance of a ratio estimate 
for a two-stage sampling design (Vol. I, Ch. 6, Sec. 7). Consistent 
estimates of the components of the rel-variance of a raUo, r, for a two- 
stage sampling design may be made as follows: The estimate, r, is given 
by Eq. 1.20. The rel-variance of r is given by Eq. 1.21. 

a. To prove: A consistent estimate of the within-psu component of the 
rel-variance^ i.e., of the second term in Eq. 1.21, is 


ni 

1 m 


A, 


x''^ m 


m 


(4.1) 


where m' is the number of units selected from the m first-stage units for 
estimating the variance. 


with 


sli = 4.ix + 2r52ixr 

Hj — 1 


(4.2) 

(4.3) 


slix = ^^txx and 


Proof. The within component of the rel-variance may be rewritten 


_1_^ 

tn 


E A'l N, - n, 


1- 

n 


N,- 


SI 


M 


Now, 


m' N • — n- 

yPffU - 


m 


(4.4) 


(4.5) 


is an unbiased estimate of the term in brackets in Eq. 4.4. We need to 
find a consistent estimate of for Eq. 4.5 to be a consistent estimate of 
the term in brackets in Eq. 4.4. Since the n, elements are a sinip e random 
sample from the N, elements, it follows from Sec. 4, Ch. 4, that 


EiS^iXY — Ei 


n,— 1 


2iXY 


(4.6) 


where 

EAy == SliY 


is defined by Eq. 1.14. Similarly, EAx^^hx and 
Since, also by Sec. 21, Ch. 4, r is a consistent estimate of 



Sec. 4 COMPONENTS OF REL-VARIANCE OF RATIO 
R, is a consistent estimate of Rl, and 

n. 


1^-^ 


N. 


2 . o2 
^2^ 


m 


159 


(4.7) 


is a consistent estimate of the term in brackets in Eq. 4.4. Also, since 
X is an unbiased estimate of Z, Eq. 4.1 is a consistent estimate of the 
within component, namely, Eq. 4.4. 

For a simple two-stage design with uniform sampling fractions, the 
rel-variance whose components we wish to estimate is given by Eq 1 25 
In this case m/d/ =/„ „,/A, =/„ a;' = = nxlf,f„ . y' 

Wi/ 2 » and n = mn. When m' = m, the estimate of the within 
component (Eq. 4.1), with the above substitutions, becomes 




2 o 


where 


mn 


w 


_ ,,,2 


W 


XY 


hXY 


^2XY 

xy 


J m 

“2- 
« i n, 


2w 


XY 


Tli 


2(%-£i)(2/,,-wT 


(4.8) 

(4.9 or I-6.7.6) 

(4.10) 


1 i 

^'x = ^xx and ivf. = Wyy 
4x S^XX and sly = S^yy 

b. To prove: A consistent estimate of the between-psu component of the 
rel-variance, i.e., of the first term in Eq. 1.21, is 


1 M^M-m 


x'^ m 


M 


(s^-s-^) 


where is given by Eq. 4.7, 


'I + '■^■51' - IrsxY 


(4.11) 


(4.12) 


■s'r, and are defined in Eq. 3.5 and 3.13. 
Proof. From Eq. 3.8 it follows that 


Esx — S^x + 


^ «,■ A, 
M 


(4.13) 






160 

Since also 


SIMPLE CLUSTER SAMPLING 


Ch. 6 


Esv — + 




•r — ^ir I 


ESxy ~ ^IXY 


2 Y, ^WT 

M 


(4.14) 


(4.15) 


and r is a 
estimate of 


consistent estimate of R, it follows that is a consistent 


Sl + 


MNfN, 
^ n 


N, 


SI 


M 


(4.16) 


where is given by Eq. 1.22 and ;S|j is given by Eq. 1.23. Hence, the 
term in parentheses in Eq. 4.11 is a consistent estimate of * 1 , and, since 
is an unbiased estimate of X, Eq. 4.11 is a consistent estimate of the 
between-psu component. 

For a simple two-stage design, with uniform sampling fractions, the 
estimate of the between-psu component (Eq. 4.11) when m' = m becomes 


i-,A 




m 


where 

with 


*2 = bl b\ - Ibx!' (4-17 or 1-6.7.11) 

bxY = ^ l^cxr - «(1 -f 2 )s 2 XY] (4.18 or 1-6.7.13) 

xy 

-5'cXF defined by Eq. 3.18, 

> y 

x = y 

m 1 X 1 

b\=^bxx^ by^byY 


Remark. When the number of second-stage units used to estimate the 
variance is n't, not necessarily equal to «*, the estimate of the within-psu 
component is given by Eq. 4.1 with 


n i 

"ST F /V 



Sec. 5 REL-VARIANCE OF RATIO IN TERMS OF d 161 

The estimate of the between-psu component is given by 

(4.20) 


1 M^M-m ^ 


where is given by with :r< = and yi is similarly defined. 


;,'2 __ 


m' 


(4.21) 

tte coraSenK the estimates of 

5. Rel-variance of a ratio estimate expressed in terms of 6 , the measure 
of homogeneity; an estimate of <5 from the sample (Vol. I, Ch. 4, Sec. 8). a. 

o prove: For a simple two-stage sampling design (see Note on p. 142) 
the rel-vanance of r given by Eq. 1.25 is approximately equal to 




(5.1 or I-6.8.6) 


where and d are defined by Eq. 5.4 and 5.5 below, and in Eq. 5.1 

through Eq 5.10, « is the expected number of listing units in the sample 
per psu in the sample. ^ 

Proof. From Eq. 1.25, the rel-variance of r is 


F2 = 


M— m N— n 


(5.2 or 1-6.6.10) 


Mm N mn 

and if m is small relative to M 

Mm' N mn 

where and are defined by Eq. 1.26 and 1.28. By definition 
f2 
and 


M—l „ TV —1 
• d- 


M 
M- 1 


N 

f2 


(5.3) 


(5.4 or 1-6.8.10) 


t3 = 


From Eq. 5.4 and 5.5 we obtain 


— 

M N 

(N- \)f^lN 


M- 1 , 


(5.5 or 1-6.8.11) 


(5.6) 




162 

and 


SIMPLE CLUSTER SAMPLING 


Ch. 6 
(5.7) 


= f2(i _ 

If we substitute in Eq. 5.3 for from Eq. 5.6 and for from Eq. 5.7 
and simplify, we obtain 


f2 

Vf = — [1 + d{n~ 1)] 


mn 


It is easy to show, also, that 


K? = + <5(«- 1)] 


(5.8) 


(5.1) 


mn 


and ordinarily Eq. 5.1 is a closer approximation to Eq. 5.2 than is Eq. 5.8. 
The reader can readily verify this by showing that the differences are 


Eq. 5.8 - Eq. 5.2 = 


m — I 
M m 

N 


(5.9) 


NM I"® v) m-1 wi / 


Eq. 5.1 - Eq. 5.2 = 

2 y irx \ -tT/ - - 

(5,10) 

and that Eq. 5.10 is smaller in absolute value than Eq. 5.9 provided 
m > 1 and > 0, as will commonly be the case. 


Exercises 


5.1. When N, = N, show that 


^ _ 1 |72 ^ |?’2 

N 


where 

and 


= Fx + V\ — 2VxY 

(5.11) 

M 


(A- \)XY 

(5.12) 

11 

11 

(5.13) 

= N for all / and define 



Show that 


w n, 

mh 


"I — -5— -J- V\[\ + ^{n — 1)] 
mn 


(5.14 or I-6.8.1) 



Sec. 5 REL-VARIANCE OF RATIO IN TERMS OF d 

where 

M N, 

y\ = 2 ^ N 

(N~ 1)X^ N- 1 

d = 

0 - i)a^lN 
and 

M ^ 

I(Xi - xF 

b. A consistent estimate of d (Eq. 5.5) can be made by substituting 
consistent sample estimates, term for term, for the population values 
involved m the definition of d. A simpler but equivalent estimate 
(provided M is large) is given by Eq. 5.18. 

To prove: A consistent estimate of 6 is 


163 

(5.15) 

(5.16) 

(5.17) 


(5' = 


where 


si + n(n — 1)5' 


(5.18 or 1-6.8.14) 


with six, sly, and s,xr defined by Eq. 3.10 and 3.18, and where 

4 = six + r^sly — 2rs2xy (5.19) 

with j| y; s\y, and defined by Eq. 4.10. 

Proof. We first note that if M is large, Eq! 5.5 can be written as 


since, in Eq. 5.5, 






81 


NSl 


Sj + N(IV~ 1).S| 


(5.20 or 1-6.8.13) 


Jf2 


P = S2 


with 


N~- 1 
N 






where we assume 1 and where .S? is given by Eq. 1.22, 

~~ with S^xr given by Eq. 1.19, Sfx = S^xx 

and —- * 2 rF- We shall now show that the expected value of "the 

numerator of S' is a consistent estimate of fl times the numerator of <5 
and that the expected value of the denominator of d' is a consistent 





164 SIMPLE CLUSTER SAMPLING Ch. 6 

estimate of f\ times the denominator of <5, and the proof that <5 is a 
consistent estimate of <5 follows immediately from Theorem 20, Ch. 3 
(p. 75). Let 




2c2 


where is given in Eq. 4.12. From Sec. 4b with the substitution 
nJNi ==/ 2 , it follows that 4 is a consistent estimate of 


72 J 


It follows from Sec. 4a that is a consistent estimate of S^. 

It follows from Theorem 20, Ch. 3 (p. 75), that the numerator of S', 
(s^ — n,r|), is a consistent estimate of 

and the denominator of d', [5^ + nifl — 1)41. 's ^ consistent estimate of 


fl[8l + N(N- 1)S|] 


6. The measure of homogeneity when the primary units are equal in size 

(Vol. I, Ch. 6, Sec. 8, Eq. 8.3). The measure of homogeneity, d, between 
listing units (second-stage units) within primary units when each primary 
unit contains N listing units is the intraclass correlation and is given by 


al - a^lN 
{N- l)a^lN 


(6.1 or 1-6.8.3) 


where erf and are defined below. 

When a population is composed of equal-sized primary units (in terms 
of listing units) and when a single primary unit is drawn at random and 
two listing units are drawn at random from this primary unit, the intraclass 
correlation is defined by 

^_ E(^i) ■^) _ (6.2) 

"" Ve{x,,-^ If VE{x,^- If 

From Sec. 2, Ch. 4, with « = 1, it follows that 


£(%- Xf 


M N 

_ i _ 


Xf 


MN 


= 


(6.3) 



Sec. 7 d FOR PRIMARY AND ULTIMATE UNITS 
Substituting Eq. 6.3 into 6.2 gives 

6 a^ = E{Xi, - X){x^^ - X) 

By Theorem 14 of Ch. 3 (p. 61), 

w 


165 


da^ = EElx,,- X)fe- X) = E^^ 


2 Y) 


A(A- 1) 


M N 

1 1 W,,- x) 

i _ 


MN{N~ 1) 


M 

2 


I(x,,-x)\ -l(x,,-xy 

Tj_ / j 


MlViN- 1) 

Equation 6.4 can be rewritten as follows: 


(5(72 


M ^ ^ M ^ 

i(x,-xf ii(x,,-x,r 

M MN{N- 1) 


(6.4) 


_ -2 


" N~ 1 


where 


M 


= X)\ and (7 


M 


1 M i? 


and hence 




and since a'^ — gI ol, it follows that 


gI — g'^IN 

l)o-2/iV 


(6.5) 


( 6 . 1 ) 


^7, Relationship between d for primary units and d for ultimate clusters 

(Vol. I, Ch. 6, Sec. 8, pp. 262 and 266). To prove: The measure of 
homogeneity, for listing units within primary units is approximately 
equal to the expected value of the measure of homogeneity, 62 , for listing 
units within ultimate clusters when the ultimate clusters are formed by 
proportionate sampling of listing units within primary units. An 

* May be deferred. This development is given to illustrate more correctly 
the properties of ultimate clusters. Section 5 suffices to show that a consistent 
estimate of d is readily available from ultimate clusters. 





SIMPLE CLUSTER SAMPLING 


ultimate cluster is defined as all the listing units included in the sample 
from a selected primary unit. 

Proof. Let 

M N 

^ -—S-(7.1 or 1-6.8.11) 

^ {N- l)FViV ^ 


MK- 1 ,, vl 


where is defined by Eq. 5.4, and are defined as in Sec. 1; and let 

MK~\ ^ \ MK-l _ 

. MK ^ En , [ MK ^ En\ 

' (n- \)^llEn E[{n- 1)^1!En] 

where K = NjEn is the number of ultimate clusters into which the listing 
units in each primary unit could be grouped without replacement, where 
the expected value is over all possible sets of ultimate clusters, and where 

bt ~ bov + b^v — ^boxv 0-^) 


^2xr — 


M K /rii \ 

(MK- \)ff 


^2X — ^2XX> ^2Y — ^2YY 

is the value of the X-characteristic for the yth listing unit in the ath 
ultimate cluster of the /th psu, is similarly defined for the E-charac- 
teristic, 

Z = XjK, Y - YjK 

— NJK is the number of listing units per ultimate cluster in the ith 
primary unit. 


NXY 


MK- 

1 o 

n — 1 o 

(7.5) 


— b\-\- 

■ — wi 

MK 


n ^ 

■ ^2X ~ 

i- wIy ■ 

“ 2iV2XF 

(7.6) 

M K 

rii 

yj 


i a 

.i 

n,~ 1 

(7.7) 


^^2X — ^'^2XX^ ^2Y — ^^2YY 





Sec. 7 5 FOR PRIMARY AND ULTIMATE UNITS 167 

We will show 


Ebl 


MK 

MK-l 


and 


M— 1 „ 

n _ 


M 


Ewl = 


(7.8) 


(7.9) 


where /2 ^ IJK and, in Eq. 7.8, n =f 2 N. Substituting these expected 
values into Eq. 7.2, making use of Eq. 7.5, and simplifying, we obtain 


Consider first 


Now 


{N- 1)V^IN 
Eh\ — Eh\x “f~ Eh\Y — 2Eb 


Eb 


2XY 


hXY 


M K 

MK 21EX,,Y,^-MKXY 
MK— 1 MKXY 


(7.10) 


(7.11) 


Now, and are ultimate cluster totals for the ccth ultimate cluster 
in the rth primary unit, i.e.. Y,, = JY,,, and T,, = where 

j j 

^nd are values for the yth listing unit in the /ath ultimate cluster of 
the particular subdivision into ultimate clusters. Since the ultimate 
clusters are formed only within primary units and all possible subdivisions 
are considered, X^^ and are the sample totals for a simple random 
sample of listing units from the listing units in the /th primary unit. 
Hence, 


EX^^Y. 




(7.12) 


Substituting Eq. 7.12 in Eq. 7.11, using = NJK, X = XjK, Y = YjK, 
and simplifying, we obtain 


Eb 


2XY 


MK 
MK- 1 

MK 
MK- 1 


■ M 


M 


liX,-X){Y,-Y) (\-~f,) lNS,iXY 
MX? n NXY 

~M- 1 _ . Wxr 


M 


^XY + (1 —/ 2 ) 


(7.13) 


where in Eq. 7.13 and also in Eq. 7.14 and 7.15, fi = /oW Substituting 
for Fin Eq. 7.13, ^ 

MK 


Ebl 


2 X 


MK- 1 


. M + - 7 - 


(7.14) 




168 

SIMPLE CLUSTER SAMPLING 

Ch. 6 

Similarly, 

, MK \M- 1 „ , ,, 

“ MK- \\_ M ^ K J 

(7.15) 


Substituting Eq. 7.13, 7.14, and 7.15 into Eq. 7.10, we obtain Eq. 7.8. 
Consider now 

Ew\ ~ Ew\x + Ew\y ~~ ( 7 * 16 ) 


Since the elements are a simple random sample from Ni elements, we 
have, by Sec. 4, Ch. 4, 


M K 

IlnAiXV 

Ew,xy = ‘ Vfy 


M 


i 


NXY 


= WxY 


(7.17) 


When X is substituted for E, we obtain 


Ewlx = 


Similarly, 

Substituting Eq. 7.17, 


Ewlr = 

7.18, and 7.19 into Eq. 7.16, we obtain Eq. 


(7.18) 

(7.19) 
7.9. 


8. Some physical properties of frequently occurring populations, and 
values of d under specified conditions (Vol. I, Ch. 9, Sec. 8, and Ch. 6, 

Sec. 8). . i. • 1 

a. Many actual populations are characterized by the following physical 

properties: 

i. The elements within a cluster are positively correlated with regard 
to a specified characteristic. 

ii. Clusters containing large numbers of elements have greater internal 
heterogeneity than clusters containing small numbers of elements. 

iii. Increasing the size of the cluster brings in correlated elements (e.g., 
in population or agriculture surveys, larger clusters are formed by 
including households or farms in adjacent areas). 

The first of these properties is widely recognized, and the losses of 
efficiency through the use of large whole clusters as sampling units are 
frequently cited. The second and third properties hold just as commonly 
in actual populations, and ordinarily for the same populations for which 
the first property holds. 



Sec. 8 PROPERTIES OF COMMON POPULATIONS 169 

The presence of these physical properties leads to the following mathe¬ 
matical relationships which have been found useful in making choices 
among alternative sample designs. 

(1) The sizes of the primary sampling units, are negatively correlated 
with the di, the measures of homogeneity among the elements 
within the primary units, where is defined as either 


E(x,,--Xf 

or 

= 





Xf~ 


1) . 


(2) The Ni and are positively correlated. 

(3) The and are positively correlated, where is the variance 
among elements in the /th primary unit. 

(4) The and cr?/A^. are negatively correlated. 


The use of these relationships can often determine the choice among 
alternative sampling procedures in situations where more specific charac¬ 
teristics of populations are unknown. The relationships, of course, do 
not necessarily hold, and exceptions to them will be found. 


b. The following values of the measure of homogeneity (Eq. 5.5) hold 
for a population of clusters for the conditions specified: 

(1) The maximum possible value for d is 

^(max.) = l (8.i) 

if all listing units in any cluster are alike in that the values of the 
characteristics X^^ and are uniform for all listing units in a 
cluster, but have some different values in different clusters. The 
clusters need not be uniform in size. 

(2) The minimum possible value for d is 

^(inin.) = -^ (8.2) 

The minimum value of 6 is obtained when in Eq. 5.5 is equal to 0. 
This will occur 

(i) if the clusters are equal in size, the estimate is a simple unbiased 
estimate of a mean or total, and X for all /; and 

(ii) for a ratio estimate when XJ F) = /{ for all i. 




70 SIMPLE CLUSTER SAMPLING Ch. 6 

(3) If the clusters are equal in size, i.e., = A, and if the primary 

units are formed by randomly grouping the population of listing 
units into M clusters of N listing units each, then 


(4) For any specified distribution of cluster sizes, A^, Ag, * • •, A^^^, if 
the A^ listing units associated with the /th cluster are a simple 
random sample from all listing units in the population,4hen, for the 
ratio estimate xjy, 

^ _ __- ^ ^ 0 (8.4) 

A-(l + F^) 

For a simple unbiased estimate 


- 4 )- 




Note that Eq. 8.4 and 8.5 reduce to 8.3 when A^ = A. 

The proof of the above theorems is left to the reader. 

^9. Relationship of the measure of homogeneity for second-stage units 
within primary units to the measures of homogeneity for elementary units 
within primary units and within second-stage units (Vol. I, Ch. 6, Sec. 8, 
Eq. 8.16). To prove: The measure of homogeneity, for second-stage 
units within primary units bears the following relationship to the measures 
of homogeneity, and 4, for elementary units within primary units and 
for elementary units within second-stage units, respectively: 

__ A[l+^i(A-l)]-[l +^2(^-1)] (9 1^ 


(A- 1)[1 + d,(K~ 1)] 

^ [1 + ^i(A- 1)] - [1 + d^{K- 1)] 
" (A- 1)[1 + 1)] 

In the population, M is the number of primary units; 


(9.2 or 1-6.8.16) 


M 

N = the number of second-stage units; 


M M Ni 

K = = 2 is the number of elementary units. 


May be deferred. 




Sec. 9 


MEASURES OF HOMOGENEITY 171 

A single bar denotes an average per primary unit, a double bar denotes 
an average per second-stage unit, and a triple bar denotes an average per 
elementary unit. By definition, 




B 


M 

2(Y,-Y)(7,- Y) 


M 


N 


LXY 


{M- \)XY 

^LX — ^LXXy ^LY ^ ^LYY 

WIX+ Wly-2Wj,^^y 


Wlxy = 


M jY ^ 


NXY 

nx=-^LXX; nY=f^LYY 


Also, 


M-~ 1 


n 


n-^Bi 


(K- \)VIIK 
Bl = Bl 

Wl=Wl^+Wlr-2fV,xr 


M 


K- 1 
K 


ff'lXY = 


M N, K„ = _ 

>: 2 2 - x,){ y,,, - Y,) 


and 


KXY 

^ix — ^ixx’ ^lY — ^irr 


N— 1 


(K~ 

— B^x + B\y 


N 


K 


2 B, 


2XY 


(9.3) 

(9.4) 

(9.5) 

(9.6) 

(9.7) 

(9.8> 

(9.9) 


Wf (9.10) 

(9.11) 

(9.12) 


(9.13) 

(9.14) 

(9.15) 

(9.16) 





172 

SIMPLE CLUSTER SAMPLING 

Ch. 6 


M 

- X){Y,- Y) 

2^^ (iV- l)XY 

(9.17) 


^2X ^ ^2XXy ^2Y “ ^2YY 

(9.18) 


Wl=Wlx+Wlr~2W,^r 

(9.19) 


M N, AT = — 

2 2 2 iYn - A«)( T,« - T.-,) 

W,xr - 

KXY 

(9.20) 


Wlx = W,xx, VYlr = l^2Fr 

(9.21) 


fin N ’ ^ 

(9.22) 

Proof. From the pairs of equations 9.3, 9.10, and 9.15 
obtain Eq. 9.23, 9.24, and 9.25 as follows: 

we readily 





'~{N-l)BllN N 

1)1 (9.23) 


Bl = U[\+ d^(K-l}] 

M ^ a: ^ ^ 

(9.24) 


^ B| = 3 [I + d^{K 1)] 

N^K 

(9.25) 

Since Bl = B|, the right-hand member of Eq. 9.24 = the right-hand 
member of Eq. 9.23, and by substituting Eq. 9.25 for {N- l)fi|/iV in the 
numerator of the right-hand member of Eq. 9.23, we obtain 

^[1 4 
K 


SdN- 1)1 

(9.26) 


If we solve Eq. 9.26 for we obtain Eq. 9.1. In most cases X will be 
close to 1. In fact, when the primary units are equal in size and also the 
second-stage units are equal in size, then 2 = 1. 


10. Optimum values for a simple two-stage sampling design with a simple 
cost function (Vol. I, Ch. 6, Sec. 16). For a simple two-stage sampling 
design, the optimum number of primary units, m, and the optimum 





Sec. 11 OPTIMUM VALUES 


173 


expected size of ultimate cluster, subject to a fixed expected total 
expenditure, C -- C^m + C^mfi, are 


and 


opt. m 


C 

T" C2P 


( 10 . 1 ) 


opt. n — 


i 


Cl 



l-d 

d 


(10.2 or 
1-6.16.2) 


where (5 is given by Eq. 5.5 and and are defined in Sec. Ic, and n 
in Eq. 10.1 is the optimum n. 

Derivation. The rel-variance of a simple two-stage design (see Sec. 1, 
Eq. 1.25) is 


t .2 . n\ 


To obtain the values of m and n which make Vf a minimum for a fixed 
total expenditure, set up the Lagrangian FA(C^m + C^mn - C). 
Then the solution of the equations, BF/^m -= 0, 3/73/7 = 0, and the cost 
equation will give the optimum values. Thus 

3F 


dF 

+ 00 - 5 ) 


Multiplying Eq. 10.4 by m and subtracting Eq. 10.5 multiplied by n, we 
obtain 


From Eq. 10.5 




ff2- 1^21 Fi 


?inF‘ — 




( 10 . 6 ) 

(10.7) 


Equating the right-hand members of Eq. 10.6 and 10.7 and solving for fi, 
we obtain the optimum n as given in Eq. 10.2. Then opt. m is obtained 
by substitution of this result in the cost equation. 


*11. Optimum values for a simple two-stage sampling design with a more 
general cost funct ion (Vol. I, Ch. 6, Sec. 18, 19).* For a simple two-stage 

* This development and the proof of convergence are due to B. J. Tepnine 

and B. Skalak. ® 

* May be deferred. 



SIMPLE CLUSTER SAMPLING 


sampling design, the optimum expected size of ultimate cluster is 


-+ C. . (11.1 or 

;_f Zl g 1-6.18.1) 

Co <3 


V C2 d 

where the expected total expenditure is C = C^Vm + C^m + C^mn and 
where m, the optimum number of primary units, is determined to yield 
either (a) a minimum error when the total expenditure, C, is fixed, in 
which case 


fn = —, or iVm a 

4 


(11.2 or 1-6.18.3) 


1 + 4 C c^±ci __ 

Cq Cq 

Cl + Con 


(11.3 or 1-6.18.2) 


or {b) a minimum expenditure when the precision is specified as e, in 
which case 

I ^ m ^ _ 

/ a (11-4 or 

“^./4 -S e \ '^ 5 n) 1-6.19.2) 


where S is given by Eq. 5.5 and and are defined in Sec. Ic. 

The optimum solution for fi and m subject to the condition of fixed 
cost is obtained by substituting any guessed value for a in Eq. 11.1, then 
substituting the resulting value for n in Eq. 11.3, then substituting the new 
value for a back into Eq. 11.3, and continuing this process until successive 
solutions for n and a yield the same values to the desired accuracy. Then 
the final a is substituted in Eq. 11.2 to solve for the optimum value of m. 
The optimum values subject to a fixed variance are obtained by a similar 
process, using Eq. 11.1, 11.4, and 11.2. 

It can be shown that this iterative process will converge, but the proof 
is beyond the scope of this book. 

Derivation of the terms used in the iterative process. The rel-variance 
of a simple two-stage design (see Sec. 1, Eq. 1.25) is 



(11.5 or 1-6.6.10) 




Sec. 11 


OPTIMUM VALUES 


175 


To determine the values of m and n which minimize the precision subject 
to a fixed cost we set up the Lagrangian : 

F(j = Hi -f- -f- C2pin — C) 


To determine the values of m and fi which minimize the cost subject to a 
prescribed precision, s, we set up the Lagrangian Fe : 

Fe = /i(F^-8) C 

Then 


3m 


. 3Fe 

—zz + kC^m = 0 ~ with 7 
mn‘^ dn 




g I 1 J gp 
rn^ mhl m^N ’ \ 2 Vm 


+ Q + Q/7 =0 


with k 


From Eq. 11.6 we obtain 


km^ 




( 11 . 6 ) 

3m 

(11.7) 

( 11 . 8 ) 


Multiplying Eq. 11.6 by » and subtracting Eq. 11.7 multiplied by m, we 
obtain 


km^ ~ 


^ 2 - ^r2|]Sl 
{CqIIV m) + Q 


(11.9) 


Equating Eq. 11.8 and 11.9, solving for /?, and substituting iVm ~ a, we 
obtain the optimum n given in Eq. 11.1. The alternative forms involving 
d are obtained by using the rel-variance in terms of d as given in Eq. 5.1 
of Sec. 5. For the case in which the cost is fixed we obtain a as given in 
Eq. 11.3 above by recognizing the cost equation as a quadratic in Vm. 
For the case in which the precision is fixed we obtain a by setting as 
given in Eq. 11.5 (or for the alternative form Vl as given by Eq. 5.1) equal 
to s and solving for m. Then a = V4m. 


REFERENCES 

(1) M. N. Ghosh, “Expected Travel among Random Points in a Region ” 
Calcutta Stat. Assn. Bull. 6 (1949), 83-87. 

(2) R. J. lessen, “Statistical Investigation of a Sample Survey for Obtaining 

Farm Facts,” Iowa Agr. Exp. Stat. Res. Bull. 304 (1942). ^ 







176 SIMPLE CLUSTER SAMPLING Ch. 6 

(3) P. C. Mahalanobis, “A Sample Survey of the Acreage under Jute in 
Bengal,” Sankbya, 4 (1940), 511-530. 

(4) P. C. Mahalanobis, “On Large-Scale Sample Surveys,” Phil. Trans. Roy. 
Soc., Series B, 231 (1946), 329-451. 

(5) Eli S. Marks, “A Lower Bound for the Expected Travel among m Random 
Points,” Annals Math. Stat., 19 (1948), 419-422. 

(6) Garnet E. McCreary, “Cost Functions for Sample Surveys,” unpublished 
thesis, Iowa State College, Ames, Iowa, 1950. 

(7) F. Yates and 1. Zacopanay, “The Estimation of the Efficiency of Sampling 
with Special Reference to Sampling for Yield in Cereal Experiments,” 
J. Agr. Sci., 25 (1935), 543-577. 

(8) U. S. Bureau of the Census, Sampling Staff, A Chapter in Population 
Sampling, U. S. Government Printing Office, Washington, D. C., 1947. 

(9) W. G. Cochran, Sampling Techniques, John Wiley & Sons, New York, 
1953, Chapters 9 and 10. 

(10) W. E. Doming, Some Theory of Sampling, John Wiley & Sons, New York, 
1950, Chapter 5. 

(11) F. X. Schumacher and R. A. Chapman, Sampling Methods in Forestry and 
Range Management, Duke University, School of Forestry, 1942, Chapter 6. 


CHAPTER 7 


Stratified Single- or Multi-stage 
Cluster Sampling 


DERIVATIONS, PROOFS, AND SOME EXTENSIONS OF 
THEORY FOR CH. 7 OF VOL. I* 


Note. The theory for stratified cluster sampling with one or more stages 
of sampling is presented in this chapter. Topics covered include the estimate 
ot the variance and its components, gains due to stratification with duster 
sampling, and the optimum design under selected cost conditions. 

It is sometimes necessary to isolate the contribution to the total variance 
attributable to a particular stage of sampling or to express the variance in terms 
of the contribution from each stage of sampling. This chapter indicates a 
procedure which makes it possible to write down the variance of sample esti- 
wiance ^ number of stages of sampling in terms of the components of the 


The notation in this chapter is the same as that introduced in Chapter 6 
except that here a subscript (h) is added to designate the strata. 


1. The rel-variance of a ratio estimate for a two-stage stratified sampling 
design (Vol. I, Ch. 7, Sec. 5). Assume that we have a population con¬ 
sisting of L primary strata, Mj^ primary units in the ^th stratum, and 
second-stage units within the Mh primary unit. Assume, further, that a 
simple random sample of nij, first-stage units is selected from Mj,, and a 
simple random sample of second-stage units is selected from the 7V^- 
in the hiih primary unit. Now let * 


r 


x' 

y' 



( 1 . 1 ) 


* Appropriate references to Vol. I are shown in parentheses after section or 
subsection headings. The number following I- after some equations gives 
the chapter, section, and number of that particular equation in Vol. I. 

177 





Sec. 1 REL-VARIANCE OE RATIO ESTIMATE 179 

follows that Ex], = X„ when we recognize that x'^ is given by in Eq. 1.1, 

Ch. 6, with the subscript h added. Hence, Ex' = f = X. Similarly 
Ey' = Y. 

By Eq. 1.8, Ch. 5, 




24.. 


and 


^x'v' 


L 

2^ 


XhV n 


Now is given by Eq. 1.2, Ch. 6, with the subscript h added provided 
we assume that x’ in Eq. 1.1, Ch. 6, is an estimate of a stratum total; 

is similarly defined; and is given by Eq. 1.12, Ch. 6, with the 
subscript /i added. Assembling the terms representing the first-stage 
contribution to the variance gives the first term in Eq. 1.2. Similarly, 
assembling the terms representing the second-stage contribution to the 
variance gives the second term in Eq. 1.2. 

When the second-stage sampling fractions are the same for all first-stage 
units in a stratum, i.e., nJN^, ^ and mJM^ = then the over-all 
sampHng fraction in the hth stratum is /, = /u/ 2 ,, the estimate (Eq. 1.1) 
becomes 


r 


x' 

y' 


and the rel-variance of r becomes 


Jh 

2 TVh 
Jh 


(1.8 or 1-7.5.13) 


_L 4 j£| 1 4 


with Slf^ given in Eq. 1.3, 


SI 


2h 


and Slf^^ given in Eq. 1.5. 

Equation 1.9 may also be written 


Mu 

M,N 


h^^h 


(1.10 or 1-7.5.15) 


where 


— , 1 Wl (l.Ilor 

+ I-7.5-16) 




A. 




— H/2 ^ 

n " A? 


M’ 


A». = 


N,. 


and where in Eq. 1,9 and 1.11 ij,, is the expected number of 

listing units in the sample per psu in the sample for the hth stratum. 





180 STRATIFIED CLUSTER SAMPLING Ch. 7 

2. The estimate of the rel-variance for two-stage stratified sampling 

(Vol. L Ch. 7, Sec. 6). An estimate, v% of the rel-variance, F? (Eq. 1.2), 
for a two-stage stratified design in which a simple random sample of at 
least two first-stage units is selected within each stratum and a simple 
random subsample of second-stage units is selected is given by 


where 


2 1 2 
X h rrlj^ 


^c'h — ^c'hX + ~ '^^^c'hXY 


(2.1 or 1-7.6.1) 


(2.2 or I-7.6.2) 


m ft 


(2.3 or I-7.6.5) 


x'u = —^u^ Xn^ixyrn'i, 

^hi 

and are similarly defined, 

2 2 _ 

V/iZ — AViZZ? ^c'/iY ^c'hYY 

ml is the number of primary units from the hth stratum used in estimating 
the variance and may be smaller than mj^. This result follows immediately 
from Sec. 3, Ch. 6, with the subscript h added to each estimated variance. 


When the sample is self-weighting within strata (i.e., 
^hl^h ~fihy —-Jihf 27 X ^q. 2.2, becomcs 


where 


4k ~ 4iiX + ^"^^hY ~ 


2 (p^hi - ^h){yhi - Vh) 


mn-~ 1 


— > y Vk 


^chX — ‘‘chXX '*c4r ~ ^ehYy 


( 2 . 6 ) 


Sec. 3 ESTIMATES OF VARIANCE COMPONENTS 181 
An estimate of Vf which is more accurate than Eq. 2.1 is 

= Ji 2(1 - A) 4 (2.7 or I-7.6.6) 

Jh 

Note that, when T =: 1, and when 4 is estimated from the units in 
the sample, Eq. 2.7 reduces to Eq. 3.19, Ch. 6. 


3« Estimates of the components of the rel-variance of a ratio for two- 
stage stratified sampling design (Vol. I, Ch. 7, Sec. 7). Consistent 
estimates of the components of the rel-variance of a ratio, r, for a two- 
stage stratified sampling design may be made as follows: The estimate, r, 
is given by Eq. 1.1. The rel-variance of r is given by Eq. 1.2. 

A consistent estimate of the within~psu component of the rel-variance^ 
i.e., of the second term in Eq. 1.2, is 




hi ^ ^ hi 


X 


mu 


''hi 


N: 


^Ui 


hi 




1 


(3.1) 


where ml is the number of first-stage units from the hth stratum used in 
estimating the variance and may be less than 


with 


^Ihi ~ 




2hiXY 


^2hiXY 


— 2 (^hij ^hdiVldi ~ Vhi) 


rjj,, — 1 


(3.2) 

(3.3) 


c2 

^^2hiX 


~~ ^2hiXXy 


and sl}^iY — ^2hiYY 


Equation 3.1 follows from Sec. 4, Ch. 6, when the estimates in that chapter 
are assumed to be estimates of the variances for the hth stratum, and 
from the fact that the variance of a stratified sample sum is the sum of 
the within-strata variances. 

A consistent estimate of the between-psu component of the rel-variance, 
i.e., of the first term in Eq. 1.2, is 


1 


(4h-sl) 


(3.4) 


where 4 is defined in Eq. 3.1 and s% is defined by Eq. 2.2. This result 
follows immediately from Sec. 4, Ch. 6, and the fact that the variance for 
a stratified sample sum is the sum of the within-strata variances. 




182 STRATIFIED CLUSTER SAMPLING Ch. 7 

4. The rel-variance of a ratio estimate for three- or more stage stratified 
sampling (Vol. I, Ch. 7, Sec. 12). To prove: The rel-variance of a ratio, 
r — x'ly\ for a stratified sampling design with K stages of sampling can 
be expressed as the sum of K terms each representing the contribution 
from one of the stages of sampling. In the special case of three-stage 
sampling, it will be shown that the rel-variance of x'jy' is equal to 


Eq. 1.2 


1 




h ^ 


2. 

j Hhij 


81 


Zhij 


(4.1) 


-Mj 


where Eq. 1.2 represents the contribution from each of the first two stages 
of sampling and where the last term of Eq. 4.1 represents the contribution 
to the rel-variance from the third stage of sampling. The notation is 
defined below. 

Proof. From Sec. 11, Ch. 4, we have for the rel-variance of a ratio of 
random variables 


where 


1 


(Ex’f 


(Ex') 


- (cr| + 


(Ey'f Ex'Ey' 


Ex' X 
^ Ey' ^ Y 


(4.2) 


For a stratified population and a simple random sample of units at each 
Stage of sampling l 

x' - 


where Xj^ is the simple unbiased estimate of the hth stratum total. 
For three stages of sampling the simple unbiased estimate of Xj^ is given 
by Eq. 4.8, and this form of estimate is extended to obtain the simple 
unbiased estimate for any number of stages of sampling. Similarly, 


L 


y' = 2^1 


From Sec. 1, Ch. 5, 




L 

and Ox'y' ~ ^^x\y'h 


(4.3) 


We shall now indicate how to apply Theorem 16 to express in terms 
of the contribution to the variance for the hth stratum from each stage of 
sampling. The developments for and follow exactly the same 
steps. By Theorem 16, Ch. 3 (p. 68), with u == %, s,- = E{Xf\j\\ the 



Sec. 4 REL-VARIANCE OF RATIO ESTIMATE 183 

contribution to the variance in the /7th stratum from the yth stage of 
sampling is 

2, 3 , • • s j-i] (4.4) 

In Eq. 4.4 the symbol 

= 2, • %j]) (4.5) 

is the expected value of considering the units selected at the yth stage 
of sampling as strata. Since the yth-stage units are selected from the 
( 7 — l)th-stage units, etc., regarding the yth-stage units as strata implies 
that the units selected at all previous stages are also fixed. Now 

2, 3, • • -.i-l] (4.6) 

means the variance of (Eq. 4.5), where the variance is evaluated within 
the units selected at the (j — l)th stage of sampling. If a simple random 
sample ofyth-stage units is selected from each (7— l)th-stage unit in the 
sample, the units selected at the (7— l)th stage of sampling can be 
regarded as strata; and, from Corollary 1 to Theorem 11 , Ch. 3 (p. 56), 
we can write down the conditional variance of any linear combination of 
random variables. The final step of taking the expected value of Eq. 4.6 
makes use of the theorem (Theorem 5, Ch. 3, p. 48) that the expected 
value of a sum is equal to the sum of the expected values, or, more gener¬ 
ally, Theorem 6 , Ch. 3 (p. 49), for a linear combination of random 
variables. The above steps make it possible to write down the contribu¬ 
tion to the variance from any stage of sampling for any linear combination 
of random variables, as will be illustrated below. 

Consider the evaluation of Eq. 4.4, for the contribution to the variance 
from the third stage of sampling. We must then evaluate 

(4.7) 

where 

^ 3 -^(41 [1,2, 3]) 

Suppose that Mj, and mj, are the number of first-stage units in the hih 
stratum in the population and in the sample, respectively; and 
are the number of second-stage units in the hith psu in the population 
and in the sample, respectively; and and are the number of 
third-stage units in the hijth second-stage unit in the population and in 
the sample, respectively. Then 


(4.8) 


, ,, Nt,- 

^3 = 2, 3 ]) = —* 2 2 — 2 


i ^hi j k 

— ItJLy Uhl V 

m • 


J. 




184 


STRATIFIED CLUSTER SAMPLING 


Ch. 7 


where is the value of X for the hijkih third-stage unit in the sample, 
and 

= —(4-9) 

is the sample estimate of the value of X for the hijih. second-stage 
unit in the sample. By Corollary 1 to Theorem 11 of Ch. 3 (p. 56), 
since we can regard the second-stage units in the sample as strata, 

Af? Nf- 

= (4-10) 

and since a simple random sample of third-stage units is selected from 
each second-stage unit in the sample, we have 



~2 /n2 Qhij ___£ft 


(4.11) 


Uhmhij 



and 

Qhii 




q9. S i^hijk ~ 

^ShijX — ^ 

\ihij 

- 

- 1 

(4.12) 


By Theorem 6, Ch. 3 (p. 49), the contribution of the third stage of 
sampling in the hih. stratum is 

2 2 _ ^ ^hi ^ ^2 Qhij q2 

^^2sj[l,2] -jyr X 2 AT ^ Hhi} Q max 


Or.-—a, 

Ti ht ^2 y/uj Q2 

Z Z ^hij ^ ^ 3 hijX 


(4.13) 


In the same way we can show that the third-stage contributions to 
and are, respectively, 


where 


2 _ Y' /02 Qhij ^hij 02 


Or. — Or. 

4^ ft V Zft^ ■ST' V^ft^J Hhi) q 

Z Z Um} 

^ft i 3 Qfiijqhij 


^3hi3XY 


2 (.^hijJc ^hi}k '^hij) 

k _ 

Qhij “ 1 


(4.14) 

(4.15) 


^IhijY ^ShijYY 


and 


(4.17) 



Sec. 5 


GAINS DUE TO STRATIFICATION 


185 


Now, the third-stage contribution to the variance of x' = 2^^ is, by 


Eq. 4.3, 

L 



7 , 

(4.18) 

Similarly, 

L 




(4.19) 


h 



L 



'Z^3x\y\ — ^Zx'v' 

(4.20) 


We now have from Eq. 4.2 that the third-stage contribution to the rel- 
variance of x'jy' is 


where 


-«. + 7iV|,.-2/?ff3,,,,) 


^ ’>4; Nu Qlu Qm - qnu « 

^ t nu t <7;„v Qu, 

(4.21) 


(4.22) 


In Sec. 1, the rel-variance of r = x'jy' was developed for a two-stage 
design. Hence, the terms of Eq. 1.2 represent the contributions from the 
first and second stages of sampling. It follows that for a three-stage 
design (Eq. 4.1) is given by Eq. 1.2 + Eq. 4.21. 


5, Gains due to stratification with duster sampling and a comparison 
with gains due to stratification with simple random sampling of listing units 
(Vol. I, Ch. 7, Sec. 4). To prove: For a proportionate stratified random 
sample of equal-sized clusters from equal-sized strata, the relative gain 
due to stratification is given approximately by 


xfN 

T(t2[1 + d{N- 1)] 


(5.1 or 1-7.4.1) 


From Sec. 7 of Ch. 5, the relative gain due to stratification when a simple 
random sample of listing units is selected from the same strata is 
approximately 

L 


La^ 


(5.2 or I-7.4.2) 


The ratio of Eq. 5.1 to Eq. 5.2 is equal to 

R 

1 + d{N-- 1) 


(5.3) 





186 STRATIFIED CLUSTER SAMPLING Ch. 7 

and is the factor by which the relative gain due to stratification with 
simple random sampling of listing units must be multiplied in order to 
obtain the relative gain due to stratification with cluster sampling providing 
the same strata are used for both simple random sampling of listing units 
and for cluster sampling. 

In Eq. 5.1 and 5.2, 

k N 

y yx 

is the average per listing unit in the /zth stratum. 

N — the number of listing units per cluster. 

M = the number of clusters per stratum. 

X = y,XjL is the average per listing unit over all strata. 


2 2 


is the population variance for a simple random 


sample of listing units. 

d = the intraclass correlation among listing units given by Eq. 6.1, 
Ch. 6. 

Proof. From Remark 1, Sec. 1, Ch. 5, the variance of a mean for a 
proportionate stratified sample with equal-sized clusters from equal-sized 
strata is 

(5.4) 

m L 


where 


m m ^ 


^2,0^hi ^ hf 

k- 1 


M - Xf 

M- 1 L k 




where 

Thus, 


-Tfti — ^hil^h 




m 


(5.7) 



Sec. 6 OPTIMUM VALUES 


187 


Now, the variance of a mean for a simple random sample of m clusters 
is (see Sec. 2, Ch. 4) 


2 ^ 

C/o — . ■' 

m M 

which may also be written, from Sec. 5, Ch. 6, 

Hence, the relative gain 

<yl-<yl ^ Eg. 5.7-Eg, 5.8 
(tI Eg. 5.9 

is given by Eg. 5.1. 


(5.8) 


(5.9) 


6. Optimum values for a two-stage stratified sampling design with 
variable sampling fractions and a simple cost function (Vol. I, Ch. 7, Sec. 9). 
To prove: For a two-stage stratified design with variable sampling fractions 
among strata and proportionate sampling within sample first-stage units 
in a stratum, the optimum expected size of ultimate cluster, n^, and the 
optimum number of sample primary units, m^, for a stratum, when the 
expected cost of the survey is 

L L 

c = SCuWft + (6.1 or 1-7.9.1) 

and the total expenditure is fixed, are 


opt. 










=y 


C 


lA 


Wl 


opt. m„ = .j-__ 

h 


(6.2 or 
CuSt-Wl/N, I-7.9.2) 

(6.3 or I-7.9.3) 


where Bl, and are defined in Sec. 1, and where is the cost 

per first-stage unit in the sample from the hth stratum, € 2 ^ is the cost 
per second-stage unit in the sample from the hth stratum, and where 
% Eq. 6.3 is the optimum value. 

Proof. The rel-variance for a two-stage stratified design with uniform 
second-stage sampling fractions within strata is (see Eq. 1.9) 






m, 




S7 

V‘h TV,. 


(6.4) 


where the terms are defined in Sec. 1. To detern.ine the values of m. 





188 STRATIFIED CLUSTER SAMPLING Ch. 7 

and fij, which minimize Vf subject to a fixed total expenditure C, with the 
cost of the survey given by Eq. 6.1, we set up the Lagrangian F: 

F — + 'Z^2h^hFh ~ 

Then the solution to the equations 0, = 0, and the cost 

equation will be the optimum. Now 

= _ MMji _ M3l + 4 . A(Ci,. + C^^nn) = 0 (6.5) 

5m. X^ml X^mluj, X^mlNj, 


K^ih + — 0 (6.5) 


NfAn 




From Eq. 6.6 we obtain 


,V2 2 

XX^ml = 


Multiplying Eq. 6.5 by and subtracting Eq. 6.6 multiplied by %, we 

Equating the right-hand members of Eq. 6.7 and 6.8 and solving for n^, 
substituting = Nj,, we obtain the optimum fi;, as given in Eq. 6.2. 

The alternative form of nj,_is obtained by making the substitutions 
Bl XI and Wl-~=^ SIJXI 

Now, substituting 

1 NA, 

m. = -7=: 

V X XnAC2n 

from Eq. 6.7, into the cost equation and solving for VI, we obtain 

Va = |(Q,+ QA)V5^/^ 

Substituting Eq. 6.9 into Eq. 6.7 and simplifying, we obtain the optimum 
as given in Eq. 6.3. 


7. Optimum values for a two-stage stratified sampling design with 
variable sampling fractions and a more complicated cost function (Vol. I, 
Ch. 7, Sec. 11). a. Consider again the situation in Sec. 6 but with an 
added term C„V« in the cost function measuring the travel and perhaps 
other costs of the survey, i.e. ^ 

C =■- C„Vm + + ICinmhf'h (^-l or 1-7.11.1) 

with m — Sm^j. 


Sec. 7 OPTIMUM VALUES 

To prove: The optimum fij, is 


189 




and the optimum nij, is 


°P*- (7-2 or 1-7.11.5) 


opt. m„ 


Clau 


(7.3 or 1-7.11.6) 


where and d are determined from the Eq. 7.4, 7.5, and 7.6 below, 

following a process of successive approximation similar to that described 
in Sec. 11, Ch. 6. 

v'sf^-ivA ^ 


X 


Vd+ C 


Ih 


Vd+C 


Ih 


(7.4 or 
1-7:11.2) 


2c,„a, + 


L 

— — . L .^ 

2“^ 


(7.5 or 1-7.11.3) 


iVm 


/ 1 1 


(7.6 or 1-7.11.4) 




where and Wl are defined in Sec. 1. 

Proof. In this case the Lagrangian is 

F=V^ + A(C„Vm + 2C^,m, + - C) 

with given in Eq. 6.4. Then 

■Ar9a2 . V-, 

'ih t" = 0 

O.T) 

(7.8) 


NISI / C _ 

3m, X^ml X^mln^ ^ X^mlN^ + t2Vm 


3F 


3n, X^m„nl 


S + 2C2*m, = 0 




190 STRATIFIED CLUSTER SAMPLING 

From Eq. 7.8 we obtain 

XX^ml SIj, 


Ch. 7 


(7.9) 


Nl C^n^n 


Multiplying Eq. 7.7 by and subtracting Eq. 7.8 multiplied by n^, we 
obtain 

XX^ml Nl 


Nl 


+C 

-7^ “T ^Ih 

2V m 


(7.10) 


Equating the right-hand members of Eq. 7.9 and 7.10 and solving for 
we obtain 




2Vm 


+ c 


Ih 


81 


(7.11) 


Substituting Eq. 7.4 and 7.6 into Eq. 7.11, we obtain the optimum n, 
given in Eq. 1,2. Now, if we let 


as 


m 

then the cost equation may be written 

C = CqVw + bm 

Considering the cost equation as a quadratic in Vm, we obtain 

_Co + VcfT^ 

Vm- 

and 


(7.12) 


iV m 


I 4C ^ , 


(7.13) 

(7.6) 


Cl 


Now, solving Eq. 7.8 for we obtain 


NAn 


(7.14) 


* V XXfi^'V ^ 

Substituting Eq. 7.11 for we obtain a, as given in Eq. 7.4. Further, 



Sec. 7 


OPTIMUM VALUES 


191 


Hence, substituting l/VI = m/Ja,. into Eq. 7.14, the optimum m, is 
ma. 


m, 


(from Eq. 7.6) 


(7.3) 


‘0 L — L 

and substituting ajVJ for N.SJXVIV^^ for from Eq. 7.14, 
and 5,^/V;i for m from Eq. 7.15, into Eq. 7.12, we have b as given in 

J-'CJ* /,D. 

b. Suppose that the cost of travel between psu’s varies from one stratum 
to another and thus the cost function is expressed as follows: 


^ _ L L 


(7.16) 


The rel-variance is again of the form given in Eq. 6.4. Then the optimum 
values of and for a fixed total expenditure are as follows: 

Cln 


opt. = 


4^1 


(7.17 or 1-7.11.10) 


opt. n* = 


N.8, 


2h 


XnW, 


N. 






V mf^) + C 


'17i. 


-2h 


bI-whn, q, ^ 


ih 


(7.18 or 
1-7,11.11) 


where and are determined from the following iterative equations: 


a. = 


M, 

X 


X, 


- NA ~ - WUN, 


and 


h — 


Vd^ + c 


Ih 


V'rf. 4^ C 


(7.19 or 
1-7.11.8) 


'17l 


"Oft 


iVnh 


(Iqa + |^Vc,,)c„,/v^ 


J (ic„,V^)V4c(iQ,n,+|^Vc^,)-|c„,V^ 


(7.20 
or 
I 1-7. 
11.9) 







192 STRATIFIED CLUSTER SAMPLING Ch. 7 

The derivation of these formulas follows the 
derivation of the optimum values in Part a above and is left to the reader 

as an exercise. 

8 Optimum values for a stratified sampUng design with joint use of 
one- and two-stage sampling, variable sampUng fractions, and a simple cost 
toctTfvol. I Ch. 7 sL 10). To proye: For a stratified sampling 
design with two-stage sampling in strata and one-stage sampling i i 
strata and with the sampling fraction within first-stage 
to U, then the optimum and when the expected cost of the 

survey is 

C = 4- -|- InitCl (8.1 or 1-7.10.1) 


and the total expenditure is fixed, are 


1 YJL Tf- 

opt.m, = ^« 


Opt. Hi} 


Mysl-N^s 


(8.2 or 1-7.10.2) 


(8.3 or 1-7.10.3) 


n I 


opt. 




(8.4 or 1-7.10.4) 


a = Cli^MnVCuVSl^- NA + V^A^C, 

\ h 


(8.5 or 1-7.10.5) 


Proof. The rel-variance of the design is 

1 ^ Ml M^-m^ 2 ( 8 . 6 ) 

X'^k Mic “ 

where and SL are defined in Sec. 1 and SI is defined as Si, but over 
the Li strata. The Lagrangian F is 

F=Vl + -t- 4- c) 



Sec. 8 
Then 


dF 

M18\^ 


X^ml 


Nisi 



dF 

MlSl , 


X^ml ^ 


OPTIMUM VALUES 
Nisi. NfA , , 


193 


inj 


KCiumn) = 0 


From Eq. 8.8, we obtain 


XmlX^ 81 


2* 


Nh Cojfll 


( 8 . 8 ) 

(8.9) 

( 8 . 10 ) 


Multiplying Eq. 8.7 by m,, and subtracting Eq. 8.8 multiplied by we 
obtain 

( 8 . 11 ) 


m 


■'Ih 


Equating the right-hand members of Eq. 8.10 and 8.11 and solving for 
fij,, we obtain the optimum % as given in Eq. 8.4. From Eq. 8.9 


m, 


MA 1 
Vcl x^i 


( 8 . 12 ) 


Let \I{XVX) = a to obtain as given in Eq. 8.2. Now, solving Eq. 8.8 
for m^, we have 

N„8^n 1 

= — -;r (8.13) 

n.Vc,, xVA ^ ’ 

and, substituting \I{XVI) = a and from Eq. 8.4, we obtain the optimum 
nij, as given in Eq. 8.3. 

Finally, substituting Eq. 8.2, 8.3, and 8.4 into the cost function and 
solving for a, we obtain a as given in Eq. 8.5. 


REFERENCES 

(1) W. G. Cochran, The Use of Analysis of Variance in Enumeration bv 
Sampling,” /. Amer. Stat. Assn., 34 (1939), 492-510. 

(2) R. J, lessen, “‘Statistical Investigation of a Sample Survey for Obtaining 
Farm Facts,” Iowa Agr. Exp. Stat. Res. Bull. 304 (1942). 



CHAPTER 8 


Control of Variation in Size of Cluster in 
Estimating Totals, Averages, or Ratios 


DERIVATIONS, PROOFS, AND SOME EXTENSIONS OF 
THEORY FOR CH. 8 OF VOL. P 

Note. When primary sampling units vary in size, i.e., in the number of 
elementary units or listing units that they contain, some methods for control 
of variation in the size of cluster in the selection of a sample and in the estima¬ 
tion are sometimes useful. The derivations in this chapter relate to this prob¬ 
lem. Ordinarily, the use of some method for controlling the variation in size 
of cluster is much more important in estimating totals than in estimating ratios, 
although many of the results for which proofs are needed and are given in this 
chapter deal with the problem of estimating ratios. 

1. Sample estimates and their variances for a two-stage sampling design 
when first-stage units are selected with varying probabilities (VoL I, Ch. 8, 
Sec. 14). To prove: If first-stage units are selected with varying proba¬ 
bilities, with replacement, and with any second-stage sampling fractions. 


is an unbiased estimate of the population total X, where x.^ ^ N^xjn^ is 
an unbiased estimate of the psu total is the probability of selecting 

the ith psu on a single draw, m is the number of psu’s in the sample, 
is the total number of second-stage units in the /th psu, and is the number 
subsampled from the /th psu. The ratio r = x'jy , where x and y are 
defined by Eq. 1.1, is an estimate of R= XjY. 

The rel-variance of r (and of x' as a special case) is 

( 1 . 2 ) 

* Appropriate references to Vol. I are shown in parentheses after section or 
subsection headings. The number following I- after some equations gives 
the chapter, section, and number of that particular equation in Vol. I. 

194 


Sec. 1 
where 


VARYING PROBABILITIES 


195 


\ ^ N■ — n. 

/ ^ P. N.rt (1-3 or 


mXY 


Pj N,n, 
mXY 


1-8.14.7) 


2(3^«-x,)(y,,- F,) 


N,- 1 


K^Vyr 


Remark. The estimate given by Eq. 1.1 would be unbiased for samples 
drawn with varying probabilities, P, for the /th psu, whether or not the 
«n!5"i 'vith replacement. The variance, given by Eq. 1.2 

and 1.3, holds only for sampling with replacement but may be a satisfactory 
approximation for sampling without replacement, especially if no P is 
large relative to 1, and if m/Mis small. y u no r-, is 

Proof. By Theorem 6, Ch. 3 (p. 49), 

PtiPinA 


and since 


1 w ^ n, 

«i PiPi j 


1 Y' 

— Xy' — :zi 

A.t N. 


■where X'ij and X^ are used to indicate the fact that X'^ and X'. are random 
variables depending on the result of the ith selection of first-stage unit, 

1 F' 1m Y' 

Ex' = E-y^ = ~yE^ 
mf P, mT P, 

1 m M V 

Similarly, Ey^ Y. ' 

Consider now V^y==a^JXY. By the corollary to Theorem 17, 
Ch. 3 (p. 68), with a: - 2 and « = fr', IV = y\ we may write 

^^xYm + ^mx'\[i])Eiy'\m) ( 1 . 6 ) 

where [1] refers to a fixed set of first-stage units in the sample. Consider 
the first term in the right-hand member of Eq. 1.6. Since for a fixed set 
of w first-stage units the sampling from one of the units is independent of 



196 CONTROL OF VARIATION IN SIZE OF CLUSTER Ch. 8 

the sampling from any other first-stage unit in the set, then, by the corollary 
to Theorem 12, Ch. 3 (p. 58), 

1 1 


Vy'lfi] 


— 2 t>2 


rv 


Since within the /th selected first-stage unit the units are a simple 
random sample from the units, then by Sec. 3, Ch. 4, 


Hence, 


„ N. — n, ^ 

O'x.V.'IJl] = ^ ~ 


iXY 


i 


m 


1 m M p_ A/. — M . 


(1.7) 


which when divided by XXis equal to the second term in the right-hand 
member of Eq. 1.3. 

Consider now the second term in the right-hand member of Eq. 1.6, 

namely, (yE{x'm)E{y'my 

By Theorem 6, Ch. 3 (p. 49), 


Similarly, 

Therefore, 


1 w \ N. 

1 m Y- 


(^FAx/\[l])Eiy'\\l]) ~ S f ^ Zl. 

Pi' mi Pi 


m i Pi 


( 1 . 8 ) 


and since the in first-stage units are selected with replacement, is 

independent of X-jP^, and by the corollary to Theorem 12, Ch. 3 (p. 58), 


m 

Eq- E8 = -T2<r^, 

• Pi.,’ Pi., 

Since the probability of selecting the /th first-stage unit is P 

Y' X 

i A i I 4 


(1.9) 


Y’- 

y 


Similarly, 





Sec. 2 
Therefore, 


OPTIMUM PROBABILITIES 


197 


El IL' 

Px Px 


-(?-)(?- 


Y 


- Y 


( 1 . 10 ) 


Equation 1.10, substituted into Eq. 1.9 and divided by ZF, is equal to 
the first term in the right-hand member of Eq. 1.3. 


2. Determination of optimum probabilities for a two-stage sampling 
design in which the first-stage units are selected with replacement, the 
sample is self-weighting, and a simple cost function is used (Vol. I, Ch. 8, 
Sec. 14). To prove: Consider a subsampling design in which primary 
units are sampled with replacement and a subsample of listing units is 
selected within sample psu’s. Suppose that the population consists of L 
size classes with Mj, primary units in the hth size class and listing units 
in the hith psu, and that is the probability of selection of a primary 
unit in the Mh size class on a single draw. Then the estimate r ~ ^'ly\ 
where x' and y' are given in Eq. 1.1 may be written 


r 


x' 

y' 


nhi 

\ L mK fJ 2 
JL V y lE J. _ 

^ h i Pfi P-hi 
n/i, 

1 L ruh JsJ yhij 
”22 ^ , 3 __ 


m h 


( 2 . 1 ) 


L 

where m =■■ is the number of psu’s in the sample, and is the 

h 

number of elements included in the sample from the hith psu if it is drawn. 

If also the sample is self-weighting, i.e., P,inJNj,,) = k, then/-- mk 
is the over-all sampling ratio, r = xjy, and the rel-variance of r becomes 


mX^ 




NA) + ^ 


( 2 . 2 ) 


AI = 






i 


M,N„ 



(2.3) 

(2.4) 


where 





198 CONTROL OF VARIATION IN SIZE OF CLUSTER Ch. 8 
with defined by Eq. 1.5, Ch. 7, and where 

1 xM, 

The optimum values of m, and k subject to a fixed total cost and 

L 

subject to the condition where the cost function is 

C = C;m + Clmj^PnM.N, + C,mkN (2.6 or 1-8.14.8) 
are given by _ 

P = ^ (2.7) 

^ " V c; + ciN, 

_ - ( 2 . 8 ) 

VM / ^I" ^h^2h 

V c; + ciN, 

m = c/(c; + Cl2PnM,N, + C,kN) (2.9) 

For a particular fixed system of probabilities the optimum k is 


\C[ + C^P^M^N^j 

Cl2^M,(Al-NA) 

h 


( 2 . 10 ) 


and the optimum m is as given by Eq. 2.9. 

Proof. If the estimate is given by Eq. 2.1, the rel-variance is (from 
Sec. 1) Vf = V% + 2F^y, with terms defined as follows: 

mXY 


L xM„M 2 M. — n.. 

2 IT-' hi ■‘■'hi ‘hi ci 

mXY 


( 2 . 11 ) 


Sec. 2 


OPTIMUM PROBABILITIES 


199 


Substituting X for Y in Eq. 2.11, we obtain and substituting Y for 
we obtain F^ may also be written in the following form: 


7/2 ^_ 


' ^ Mv. L M 

2-^a| + 22i 

- h 7, h i z 


^hi 


^hi^hi 


^Ihi 


i 


( 2 . 12 ) 


Substituting nj,i = kNJPj^, for a self-weighting sample, we obtain F,^ 
as given in Eq. 2.2. 

To obtain the values of Pj,, m, and k which make F^ a minimum subject 

L 

to a fixed total cost and ^Pj^Mj^ = 1, we set up the Lagrangian F: 

h 

F= Vf + + C';m2P^M„N^+ C^kN-~ cj ij 

Then the solution to the equations 5F/dP^ = 0, SF/dk = 0, 5FI3m = 0, 
3F/aAi = 0, 5F/dX^ = 0 will give the optimum values of P^, m, and k. 


dF 

1 m^{xi-^nA) 



P\ mX^ = 0 

(2.13) 

dF 

1 NSl , 


dk 

mX^ ® 

(2.14) 

dF 

dm 

1 UM,(Al-NA) A^.§|1 
m^X^ If P, + fc J 



“1“ + C2^a1 = 0 

(2.15) 


0 is the cost condition, and ~ 0 is the condition 

^PjMk = 1 • From Eq. 2.14, 


Multiplying Eq. 2.13 by PJm, summing to L, and subtracting Eq. 2.15, 
we obtain 


As = mC[X^ = ^ 


C, mX^k^ 


Substituting these values of X^ and A 2 into Eq. 2.13 and solving for Pi 
we obtain 


(Al-NA)C,k^ 
* Ci + ClN, ^2^ 


(2.16) 






Substituting Eq. 2.7 into Eq. 2.16 and solving for k, we obtain the 
optimum k as given in Eq. 2.8. And, solving the cost equation for m, 
we obtain the optimum m as given in Eq. 2.9. 

For a particular fixed system of probabilities, we have the Lagrange 
equations 2.14 and 2.15, and the cost condition. Solving these equations, 
we obtain the optimum k as given in Eq. 2.10. This solution is straight¬ 
forward and is left to the reader. 


3. Optimiiin values for a two-stage stratified sampling design with a 
uniform over-all sampling fraction and a simple cost function, and a com¬ 
parison of optimum probabilities with stratification by size as a control on 
variation in size of psu (Vol. I, Ch. 8, Sec. 12 and 14). To prove: For a 
two-stage stratified design with a uniform over-all sampling fraction the 
optimum and when the cost function is of the simple form 

C = C[m H- + Q/A (3.1 or 1-8.12.1) 

and the total expenditure is fixed, are 


and 


opt. fij, = 


C[ + 


(3.2 or 1-8.12.2) 


opt. mj, = 


- NA /VQ (3.3 or 

Vc; + C\N^ ^2 1-8.12.5) 


where 


Sec. 3 

and the optimum / is 


OPTIMUM VALUES 


201 


opt./^ 


Vq [|vc; + C\N^ -- na + Vc,NS,_ 


(3.5 or 1-8.12.6) 

Proof. When the sample is self-weighting, i.e., f = fufn, the rel- 
variance given by Eq. 1.9, Ch. 7, becomes 

K == ^2 I (^ - l) + Y, 1^(1 -ffNA (3.6) 

where Sl^ is defined by Eq. 1.3, Ch. 7, and Sl^ is defined by Eq. 1.10, 
Ch. 7. The cost function may be written 


C = 2(C; -f C{N^)M„ f + CJN (3.7) 

To obtain the values of and / which minimize Vf subject to a fixed 
total expenditure set up the Lagraugian F: 


2(C; + ClN,)M,f + Q/V- C 

^ hh ) 

Then the solution of the equations dFjdf^j, -- 0, dFjdf 0, and the cost 
equation 3.7 will give the optimum values as shown below: 

E. = ^ _ « _ 5 + cin,)Mj 

3/2. XJ xy /!,. (^-8) 

_ iMEvJsh_ , INAf.,, 

3/ xy2 x^p + x'^p 

, Jl(c;+C;V,)M, 


+ 


+ C,V =0 (3.9) 


Multiplying Eq. 3.8 by/^;,, summing to L, and adding Eq. 3.9 multiplied 
by /, we obtain 


From Eq. 3.8 


CYX^ 




■' X\C[ + C\N^) 

Equate the right-hand members of Eq. 3.10 and 3.11 and solve for the 







202 CONTROL OF VARIATION IN SIZE OF CLUSTER Ch. 8 

optimum expected size of ultimate cluster in the hth stratum, 
to obtain Eq. 3.2. _ 

Now, since substitute Eq. 3.2 for Nfj 2 h to obtain the 

optimum nij^ given in Eq. 3,3. Substitute Eq. 3.3 into the cost function 

L . 

for m = 2^71 and for rrij, and solve for /to obtain the optimum / given 
in Eq. 3.5. 

Comparison of optimum probabilities with stratification by size. If the 
stratification is by size and the strata are the size classes used in Sec. 2, 
then Slj, and Si in Sec. 2 are the same as Slj, and Si in this section. We 
saw in Sec. 2 that the over-all sampling fraction is/ — mPfij^jNj^. Hence 
mPj^ is comparable to mJMf^. Note from Eq. 2.7 and Eq. 3.3 that mPj^ 
and mJMj^ are proportionate to quantities that differ only in that A| in 
Eq. 2.7 replaces SIj, in Eq. 3.3. The difference between and SIj, is 
approximately 

’’■(t-f)’ 

so that the two expressions are approximately the same whenever the 
XJ Yj^ do not vary a great deal from stratum to stratum. 

Proof From Eq. 2.3, 

\ Mk IV F\2 1 Mn 

^ Yl 

= / M (3-12) 

with R = XIY. From Eq. 1.3, Ch. 7, 


R?,,, + R?Sl, 




2 VI, - M,Xl + R^2 Y% - n 


2R + 2RMYj^ 


For Mj, large, so that the assumption Mj, = (Mf,~ 1) is valid, 


RY,f = n 


4. Comparison of (Eq. 5.4 of Ch. 6) and (Eq. 5.11 of Ch. 6) (Vol. I, 
Ch. 8, Sec. 1 and 11, also Vol. I, Ch. 6, Sec. 8). To prove: For the class 
of populations described in Sec. 8, Ch. 6, we may expect that > 1, 

where is given by Eq. 5.4 of Ch. 6 and is given by Eq. 5.11 of Ch. 6. 


Sec. 5 EFFECT OF VARIABLE CLUSTER SIZE 203 

Proof, and can be restated in the forms 


M 

1 M 

N- 


mz] 

i 

—^1 
N 7 

- 4 - 


■ (4.1) 

MN^X^ 


NX^ 

M 

M Ni 



mz! 


2.f 


MNX^ NX^ 


(4.2) 


where for we have assumed that N is large so that A/(A— 1) is very 
close to 1, and 

z„. = (4.3) 

Zi=%Z,,= X,~RY, ( 4 . 4 ) 

and 






(4.5) 


The last term in Eq. 4.1 will be nearly equivalent to the last term in 
Eq. 4.2 provided the are moderately large (it may be about equal 
under less stringent conditions). When these last terms are about equal, 
then 

M ^ M _ 


where is the covariance of V,- and V,-2f. For the class of 

populations described in Sec. 8, Ch. 6, this covariance is positive, and 
thus for many common sampling problems 

f2 

72 ^ ^ 

Illustrations are given in Case Study D of Ch. 12, Vol. 1. (Compare, 
also, Sec. 2, Ch. 9, where it is shown that sampling with probability 
proportionate to size, under the same conditions, gains over sampling 
with equal probability.) 


5. Effect of variation in size of cluster in estimating totals (Vol. I, Ch, 8, 
Sec. 4). Consider a one- or more stage cluster sample design with a 




204 CONTROL OF VARIATION IN SIZE OF CLUSTER Ch. 8 

uniform over-all sampling fraction,/, and with m primary units included 
in the sample; it is desired to estimate X. If a simple unbiased estimate, 
x\ is used, where 

x' = yx (5.1) 

and X =- is the aggregate value of the X»characteristic for the units 
in the sample, then, 

^ F? + Vl + 2p,,F,F, (5.2 or I-8.4.1) 

where n is the number of elementary units in the sample, or some 

other aggregate measure of size associated with the units in the sample, 
and where F^ is the rel-variance of xjn, Vl is the rel-variance of n, and 
is the coefficient of correlation of r and n. 

Exercises 

5.1. Show that Eq. 5.2 holds approximately and that the last two terms 
become zero if the sample selection is made in such a manner that, if two or 
more stages of sampling are used, the first-stage units are selected with proba¬ 
bility proportionate to N^, the size of the zth unit. 

5.2. Let Hi where Ni is the size of the ith first-stage unit, and where 

the first-stage units are a simple random sample. Show that the last two terms 
of Eq. 5.2 vanish if the estimate {xjn)N is used. 

Remark. The importance of the relationship given by Eq. 5.2 is that it 
separates the rel-variance of an estimated total into a component due to 
the variation in size of cluster and a component representing the variance 
that would arise if the problem were to estimate the ratio xjn. The variation 
in size of cluster usually has much less effect on the variance of a ratio than 
on an estimated total based on an expansion of the sample by the reciprocal 
of the sampling fraction, and therefore the latter two terms in Eq. 5.2 
represent the principal contribution of variation in size of cluster. The 
last term will often be small, in which case the contribution of the variation 
in size of cluster is given approximately by F|. 

REFERENCES 

(1) U.S. Bureau of the Census, Sampling Staff, A Chapter in Population 
Sampling, U.S. Government Printing Office, Washington, D.C., 1947. 

(2) W. G. Cochran, “Sampling Theory When the Sampling Units Are of 
Unequal Size,” J. Amer. Stat. Assn., 37 (1942), 199-212. 

(3) M. H. Hansen and W. N. Hurwitz, “On the Determination of Optimum 
Probabilities in Sampling,” Annals Math. Stat., 20 (1949), 426-432. 

(4) D, G. Horvitz and D. J. Thompson, “A Generalization of Sampling without 
Replacement from a Finite Universe,” /. Amer. Stat. Assn., 41 (1952), 
663-685. 

(5) Hiroshi Midzuno, “An Outline of the Theory of Sampling Systems,” Annals 
Inst. Stat. Math. {Japan), 1 (1950), 149-156. 



CHAPTER 9 


Multi-Stage Sampling with Large Primary 
Sampling Units 


DERIVATIONS, PROOFS, AND SOME EXTENSIONS OF 
THEORY FOR CH. 9 OF VOL. I* 

Note. So far as the theory is concerned, there is no distinction in multi-stage 
sampling whether the primary sampling units are large or small. However, 
some principles and methods become more important with large psu’s, and 
these are emphasized in this chapter. They include substratification in sampling 
second-stage units, extensive use of varying probabilities in the selection of 
primary units, inclusion of a small number of primary units in the sample per 
primary stratum, the determination of optimum sizes of strata, and allowance 
for travel within psu’s by a separate term in the cost function. 

Of course, the theory introduced here for large psu’s is applicable to any 
problem in which the principles are applied, whatever the size of the psu’s. 

Some notation used in this chapter. 

^hiajk = value of the Z-characteristic for the hiajkih third-stage unit 
in the population; i.e., is the value for the ^th third-stage 
unit in the yth second-stage unit in the aih substratum in the 
zth primary unit in the hih primary stratum. 

^hiajk = value of the Z-characteristic for the hiajkih third-stage unit 
in the sample, 

Xhiajk and represent similar values for a T-characteristic. 

Numbers of sampling units and strata are as follows: 

In the sample In the population 
Number of third-stage units in hiajth 

second-stage unit ' Q^.^. 

Number of second-stage units in ath 

substratum of hith primary unit nj^^^ 

* Appropriate references to Vol. I are shown in parentheses after section or 
subsection headings. The number following I- after some equations gives 
the chapter, section, and number of that particular equation in Vol. I. 

205 





206 


MULTI-STAGE SAMPLING 


Ch. 9 


In the sample In the population 
Number of substrata in hiih primary 


hi 


unit 

Number of primary units in hth stratum 
Number of primary strata L 

With the notation above the following additional notation is defined: 


D 

L 


In the sample 

Qhiaj 

'hiaj ~ 




hia 


2 ^Majk 
k 

nhia 

■ 2 ^hiaj 
0 


In the population 

Qhiaj 

2 ^hiajk 
k 

Nh.a 


^hiaj 




hia 


2 ^hiaj 

j 


Rh 


L 

X JXj, 

h 

Uhia 

ia 2 ^hiaj 

J 


h 

Nhia 

Qhia 2 Qhiaj 

3 


L 

1% 


h 
L 

m = 

h 


Average values per third-stage unit 

^hiaj ^hiajl^hiaj 


Q = iQh 

h 

M=iM, 

h 


^ hiaj ^hiajl Qhiaj 


X = xlq 

Average values per second-stage unit 

^hia ~ 


X = XIQ 

Xjiia Xji^qI 


X = x/n 


X= X/N 




NOTATION 


207 


In the sample In the population 


^hia ^hial^hia 

Qhia QhiJ^l 

q = qin 

Q=^ QIN 

Average values per primary unit 



X, = XJM, 

X xjm 

X= XjM 

n ~ njin 

N^NjM 

Average values per primary stratum 

XIL 

m mjL 

= MIL 

Ratios 

fl NjL 

r xjy 

R = XjY 

fh --= xjyn 

Ru=XJY„ 

i-hi =■■= ^^hilVki 

R,, = XJY,, 


(If no substrata are used the a is omitted, and if no primary strata are 
used the h is omitted.) 

Simple unbiased estimates of totals are 
x' is estimate of X, \ 

xi is estimate of I ^ further definitions of these 

j estimates. 
x^^ is estimate of X^^, etc.) 

^hi ~ probability of drawing /th psu from ht\\ stratum in a single draw 
(i.e., for a sample of 1 psu). 

Sampling fractions are 

fhiaj ^hiajl Qhiaj is the third-stage sampling fraction. 

fhia ^ f^hial^hia is the second-stuge sampling fraction in the ath sub¬ 
stratum. 

/is a uniform over-all sampling fraction. 

f hi fhia fhiaj thrcc-stagc Sampling with a uniform over-all 

sampling fraction. 







208 


MULTI-STAGE SAMPLING 


Ch. 9 


*1. The rel-variance of a ratio estimate for a multi-stage stratified sampling 
design (Vol. I, Ch. 9, Sec. 14 and 26). a. General case. Assume that we 
have a population consisting of L primary strata, psu s in the Mh 
primary stratum, and substrata in the zth psu of the Mh stratum. 
Then an estimate of i? = X/ T is given by 

L \ rrih \ 

_ __ 1 ^hi /I 

^ L \ mh \ 

h i ^hi 

where rn^ is the number of first-stage units included in the sample from 
the hih stratum, is the probability of selection of the zth psu of the 
/zth stratum on a single draw, and and are unbiased estimates of 
and Assume further that the first-stage units are selected with 
replacement but that the second- and subsequent stage units are selected 
without replacement. 

To prove: We shall show that 

Vl=Vl.+ Vl.-lV,.,. ( 1 . 2 ) 

with 

L I L I Ms I 

2 ^hXY 2 “7 2 

y ^jLUhl _ _ h i P^i - j j. 

f'xV XY XY 


where 


^x'x' and - Vyy 


— 2^ZiZ 


and is the covariance between and the estimated totals for 

X ft) y hi 

the zth psu in the /zth stratum. 

Proof. From the general theorem on the rel-variance of a ratio (see 
Ch. 4, Sec. 11), the rel-variance of r may be written 




F^ + F^ 

x' ^ V 


We shall show that V^.„. is given by Eq. 1.3. By definition, 


Since the sample is drawn independently from stratum to stratum, 


CoiV = 


May be deferred. 


Sec. 1 

REL-VARIANCE OF 

RATIO ESTIMATE 

209 

where 

1 1 . 

1 nih 1 



k" 

I s 

II 

^h 1 At 

(1.8) 

and 




II 

II 

tq 

(1.9) 


To evaluate we make use of Theorem 15, Ch. 3 (p. 65), which 

expresses the covariance in terms of a contribution from sampling first- 
stage units and a contribution from sampling within first-stage units. 
Thus, from Theorem 15, Ch. 3, with m = 4, v = and the random 
event b* being the selection of a set of rrij^ first-stage units, we have 

^T/nV\ ~ + ^E{x\\{l])E(y\\[l]) (LIO) 

where represents the covariance between Xj, and within fixed 

first-stage units, and is the expected value of this conditional 

covariance. In the second term of Eq. 1.10, jE’(a;^|[l]), which we will also 
write as represents the conditional expected value of Xj^ for a fixed 
set of first-stage units, ^(^;|[1]) is similarly defined, and is 

covariance between first-stage units within strata. 

Consider the evaluation of By Theorem 14, Ch. 3 (p. 61), 

, 1 1 

^ “7 2 

i ^hi 

where E with a subscript represents the conditional expected value of the 
expression following it for the fixed values represented by the subscripts. 
Since Xj,^ is an unbiased estimate of 


Similarly, 


1 Wft 1 

I ^hi 


^[ifJh 


1 t 


Yhi 


^h i 

Therefore, since the psu’s are selected with replacement, 


Now, 


A) J5m?A 


t mn 

”■ 2 Y, 


hi^ X M 
Ai Phi 


hi 


hi 


Y 

"V P -IM _ V 

i Pi,,- 


since the probability of selecting the hith primary unit is 
E(YnJPM) = Y^. 


( 1 . 11 ) 


( 1 . 12 ) 


Similarly, 




210 


MULTI-STAGE SAMPLING 


Ch. 9 


Hence, 


*E[iix’h, E[i]y'A 




where Sj^xy is given by Eq. 1.4. 

Consider now the evaluation of By Eq. 6,2, Ch. 3, 

^x\ ywm “ ^ ~ ^[lY^diVk ~ [i] 


(1.14) 


Since the sampling is carried out independently within the selected 
primary units, the conditional covariance, Eq. 1.14, becomes 


^oc\yW[\] 



1 Wa 

1 



E 

— 2 
■ml i 

Pli 

(*k- 

■ XuM 

1 

rrih 

1 




lEm 

i 

p2 

(*M- 

- XuM 

1 

^ 1 





1 ^hi 

<^x\ 

X h 

uy'hi 





1 ^ 

1 

Eg 

oo\yW[l] 

= 

-2 

mu i 

p ^X hti 

^ hi 


(1.15) 


Substitute Eq. 1.13 and 1.15 into 1.10 and then substitute Eq. 1.10 into 
Eq. 1.6 to obtain as given by Eq. 1.3. 

b. The rel-variance of r for a three-stage design with substratification. 
For this design r is given by Eq. 1.1 with 


Du 1 nua 1 

27 - 27 “^.. 

a Jhia 3 Jhiaj 


(1.16) 


where = n^JNuaJnm = Entail Quai, « the number of second- 
stage units in the population in the ath substratum in the hith. psu, Quiaj 
is the number of third-stage units in the population in the hiajth second- 
stage unit, nj^ia, and are the corresponding numbers in the sample, 

Qhiaj 

and - 2 The are similarly defined. 

h 


Sec. 1 REL-VARIANCE OF RATIO ESTIMATE 
Vl is given by Eq. 1.2 and 1.3 with 




where 


and 




^hia n 


hi a 


hia ^hiaXY 


N 


hia 


n 


^hia 


Du fJ Nh,. 

+ 2—22^ 

« ^hia j 

hia 


Q 


hiaj 


hiaj 


Rhiaj ^ 


hiaiXY 


Q 


hiaj 


^hiaj 


Sh.. 


^ ^^hiaj ^hia}i^hiai ^hia^ 


hiaXY 


^hia~ 1 


^hin ’ 


N hia 

1 Xuui 

.7 


iV: 


hia 


hia 


N hta 

-kr 


Qm 


^htajXY ~ 


^ ^^hiajk ^hia])(.^hiajk ^hiaj) 


Qhiaj 1 


Y _ 

hiaj 


Qhiaj 

2 ^hiajk 


Qhi 


Y _ 

^ hiaj 


Qhiaj 

2 ^hiajk 


hiaj 


Qhi 


hiaj 


211 


(1.17) 


(1.18) 


(1.19) 


Equation 1.17 may be obtained by applying Theorem 15, Ch. 3 (p. 65), 
°x\.y\,- The details are the same as written out in Sec. 1 of Ch. 7 and' 
Sec. I of Ch. 6 and are left to the reader. 

Note that by combining corresponding terms of FL F^, and F, we 
may write the rel-variance as follows: 


F^ = —+ ' V 1 VAr2 

*■ i w. " ^ ^ 22 AIL .. 

X h rrif^ A k i ^ iV, 


Nhia — 8t 


hia 


hia ^hia 
^hia 


where 


\ L I Mh I Dm M Nma f) 

_j-y- V_ V V ^hiaj 

j m ■ P ^hiai r- 

^ h my , ry, y nyiy j Q^.^. 


^hiaj i^hiaj 


^hiai 


( 1 . 20 ) 


^hia ^hiaX ^^hiaY — '^^^hiaXY 
^hiaj ^ ^hiajX + R^^hiajT — '^R^hiajXY 


(1.21 or 1-9.30.1) 
(1.22 or 1-9.30.2) 
(1.23 or 1-9.30.3) 


c. Special case when = 1/Af;, {Le., equal probability of selection of 
psus within a stratum) and the sampling of primary units is made without 
replacement. The evaluation of the covariance proceeds exactly as above 



212 


MULTI-STAGE SAMPLING 


Ch. 9 


for the contributions from the second and third stages of sampling. 
However, the evaluation of the first-stage contribution is made under the 
assumption that the psu’s were drawn without replacement, and with 
equal probability of selection within a stratum, whereas in Part a it was 
assumed that the psu’s were drawn independently (with replacement), 
and with varying probabilities of selection within a stratum. Hence, by 
substituting 1 /M^ for in Eq. 1.11 and then recognizing that primary 
units are selected without replacement, we have for the first-stage 
contribution 

E y.) (1.24) 

\mj^ i / i > 

mh ^ Wft 

and, recognizing again that the covariance of is 

given by the procedure of Sec. 3, Ch. 4, we have 

Xu){yHi- n) 


Eq. 1.24 = Ml 


Mh - mu 

mhMj, 


JVf, 

liX,, 

i 




(1.25) 


Thus, 


is given by Eq. 1.25 plus Eq, 1.15 with substituted forP^^. 
d, Vffor a three-stage design when we have a self-weighting sample with 
= ift for all h, with the second-stage sampling fraction small relative to 
1 and with constant third-stage sampling fractions. In this case the rel- 
variance may be written 

e-f wi 


q. 

m mh Q mhq 


(1.26) 


where 




Bl - 2B 


XF 


B 


L Mn 

11 Pm 

h i 




'XY 


LXt 


i^hXY 

_h __ 

L^t 


Ex — Exx 

L 


and Bl 


B 


and Y 


Y — ^YY 

iih 

L 


and 


Wl= Wf,r+ Wlr-2W,XT 


^bXT — 


bX 
L Du 

2 2 2 ^Ma^hiaXY 
Ji id _ 

NXY 





Sec. 2 CONDITIONS FOR GAIN WITH PPS 213 

= ff'.xx and fV^r = fy.rv 
“F IF^F ^ R^wXF 

and where in Eq. L26 and throughout this subsection, q = f^Q is the 
expected number of third-stage units in the sample per second-stage unit 
in the sample, and n — fNjf^m is the expected number of second-stage 
units in the sample per primary unit in the sample. 

L Mh N^,„ 

2 2 2 2 Qhiaj^hiaiXY 

TT/ _ h i a j 

^ ^^—~ 

Q^Y 




wXX 


and Wlv = W. 


wYY 


These results follow from Eq. 1.20 with the appropriate substitutions, 
namely, 


mj, = m — 


m 


1 -fma = 1 


f f\ ^^hifhiafhiaj 


fhiaj fz ^ 

X^^LX^NX qI 
Y =.LY ^ Qf 
n = mn, q ~ mnq 


2, Conditions under which a gain is to be expected by using probability 
proportionate to size (VoL I, Ch. 9, Sec. 11). To prove: For many 
common populations, and with a fixed number, w, of primary units in 
the sample, a smaller variance will be obtained by sampling with prob¬ 
ability proportionate to size than by sampling with equal probability, but 
the relative reduction in variance will not exceed F|r/(1 + 

Proof. Consider drawing a sample, with replacement, of m psu’s from 
a population of M psu’s, and making the estimate r = x'jf (Eq. 1.1). 
Then, from Eq. 1.2 and 1.3, the between-psu contribution to the rel- 
variance of r can be stated in the form:* 


* See also Sec. 4, Ch. 8, for a related comparison. 




14 MULTI-STAGE SAMPLING 

(a) If the psu’s are sampled with equal probability: 

] M ^ 


Ch. 9 


where 


- yN^d^ 


{b) If the psu’s are sampled with probability proportionate to size 

, M 


=- tN,4\ (2-3) 

Then we gain by using pps whenever Vl — Vl> 0, or whenever 
mN^2%Vl-Vl)> 0. Notice that 

\ /M M \ 

mmx\Vl- iVW!) (2-4) 

is the covariance of Ni and N^dl and will be positive whenever Ni and 
N^d^ are positively correlated, which is the case for the commonly 
encountered class of populations described in Sec. 8 of Ch. 6. 

Also, for this class of populations, i.e., when Ni and Nidf are positively 
correlated but A,- and d^ are negatively correlated, the intercept of the 
least-squares regression line of Nid^ on is positive, and consequently 


where V% is the rel-variance of the sizes of the psu’s. This follows since, 
if w — c y- du is the regression of w on u, th^least-squares_value of the 
intercept, c/isW- (c^wJUV?;), where Ew-= W and Eu = U. 

In our case C/,- = A,- and Wi = N^l Therefore, 

M M _M 

_ mdf lNfd!-_NlN,dl 

M MNV% 

Then the condition of a positive intercept, c> 0, is 

M M __M 

IN A (27) 

M MNV\ 





Sec. 3 WHEN TO EQUALIZE SIZE OF STRATA 
from which it follows that 


It also follows that 


v% 

VI 1 + n 


215 

( 2 . 8 ) 

(2.9) 


3. When to equalize the sizes of the strata (Vol. I, Ch. 9, Sec. 8 and 24). 
To prove: A rule of thumb which will provide a rough guide to the 
optimum sizes of strata when a constant number of psu’s are selected 
from each stratum is to make the strata equal in terms of Xj^ when the psu 
rel-variances are about the same and remain about the same on adjusting 
the sizes of strata. 

Proof. Assume that the population is divided into L strata. Assume 
further that the strata are grouped into G classes so that the strata within 
a class have about the same rel-variances between psu’s within strata. If 
there is no subsampling, an estimate of X can be written 


G \ Lo Mo Y 

*' = 2 — 27 — 


(3.1) 


where X^j^^ is the total for the /th psu in the hth stratum of the gth group, 
and nig = nigj^ is the number of psu’s in the sample from each stratum 
in the gth group. 

4 = 2^2^^A (3.2) 

g h 

where 


Msh 

iPm 






^gh ^gh 


(3.3) 


If, by a shift of some of the psu’s from one stratum to another within 
the gth group, the remain about the same for all Lg strata, then the 
values of Xgj, which will minimize crj are determined by finding the values 
of Xgj^ which will minimize 




gk^gh 


subject to the condition that 'Z^gh = 


This minimum is given by the solution to 



and 









216 

where 


MULTI-STAGE SAMPLING 


Ch. 9 



and where A is the usual Lagrangian multiplier. 

Hence, the minimum is obtained when Xgi^ is proportionate to 
and if the are constant, Xg^ should be made the same for all strata^in a 
group. It follows that if the Bgj, are the same for all g and /?, and nig is 
the same for all strata, then Xgj, should be made the same for all strata. 

The psu contribution to the variance with multi-stage sampling is the 
same as Eq. 3.2 above, and consequently the same values for XgJ^ minimize 
the between-psu variance with multi-stage sampling. 

Remark. Similar results can be shown for ratio estimates. Note, also, 
for ratio estimates that if the variances are about equal in the different 
strata, instead of the rel-variances, then the Yg-^ should be equalized instead 
of the Xgj,, Often the Yg^ are known, and the Xgj, are not, and the Xgj, and 
Ygj^ are highly correlated. Then the measures of size to be equalized in 
practice would be the 



m 





Sec. 4 


ESTIMATE OF REL^VARIANCE 


217 


a. Multi-stage stratified design with variable fractions. In a multi-stage 
stratified sampling design with either fixed or variable sampling fractions, 
we have within each stratum 


and 

Also, 

Hence, 




NT 


hi 


'll, . 

2 ' 

p” 

^h m. 




u 


h") 


W= Y, 


^UW “^ 3 . 




~ 1 ) 


^x'hy\ 


L 


(4.3) 

(4.4) 

(4.5) 


yy 


xy 

is a consistent estimate of V^>.y>. Further, 

'>’1' = Vrt' and vl = V. 

are consistent estimates of Vl and K|., respectively, and 

V^r = ^l + Vl.-2V,.^. 
is a consistent estimate of given in Eq. L2. 

b. Fixed over-all sampling fraction. In a multi-stage stratified sampling 
design with a fixed over-all sampling fraction the estimate in Eq. 4.5 
above becomes 


(4.6 or I~-9.27.3) 


(4.7) 


''XV _ 

m m 


L x,i){yM -yn) 

2m, i --- 

1 ^ mr, — I 


mxy 


(4.8) 


This follows readily by making the appropriate substitutions as follows: 

^ hifhiafhiaj 


__ 

Pm f 


hi 


with X 


hi 


Dfii Ylhia Qhia^ 

2 2 2 

a j k 





218 


MULTI-STAGE SAMPLING 


Ch. 9 


If ^ 

i ^hi _ J_ _ 

f f 


X 

f = — 
m 


L 1 'lYif, n/'' . 

h i -^hi 


L 

fi __ z 

f f 


and similarly for the 7-characteristic. Further, 


and vl 


vj = vl. + 4 


becomes 




Equation 2.7 of Ch. 7 becomes identical in form with Eq. 4.9 above if, 
in Eq. 2.7, the finite multiplier is close to 1 and if a uniform over-all 
sampling fraction is assumed. 

*5. An estimate of the rel-variance of ratio estimates when only one 
primary unit is taken from each stratum (Vol. I, Ch. 9, Sec. 15, Eq. 15.1, 
and Sec. 28). a. When only one primary unit is selected from each 
stratum, it is not usually possible to obtain a consistent estimate of the 
variance. However, it is possible to obtain an estimate which tends to 
be an overstatement, and, for many practical problems, the overstatement 
will not be serious. The procedure is to combine the strata into G groups 
with Lg strata in the gth group. Assume that we are estimating the 
rel-variance of /* ^'ly\ where r is defined by Eq. 1.1. 

To prove: We shall show that an estimate of the rel-variance of r that 
tends to be an overestimate is 


where 


vf = «x' + 




* May be deferred. 


Sec. 5 




■ = -L y _ 

x'y' x'y 't L, 


ESTIMATE OF REL-VARIANCE 
L 


S-: 


219 

3 or 
■9.28.2) 




^gh ~ 


ghi 


and where Xg,^^ is an unbiased estimate of a total for the ith psu in the 
nr?b ^th stratum of the ^th group, P,,, is the corresponding 

probabihty of selecting the psu, and Ex'„, = A,, is some measure 


associated with the g-Zith stratum ^that tends to be highly correlated 

with = 2%, and A, = the and i/^ are similarly 

defined; and v^, = v^,^. is given by Eq. 5.2 or 5.3. 

Proof. (1) In order to prove that tends to be an overstatement of 
Vr and to indicate the mathematical conditions for the bias to be small 
we assume for now, and prove in (2), that 


where 


with 


^4 


(Exy 


1 + 


4-1 


h 


^^74- 




V 


Lu 

lA,hO 

h 


,2 

X ah 




AmI 


L, 

'Z^ah 

h 


and where 


24„ 


v\ 






Now, for sulficiently large samples (from Sec. 15, Ch 4) E(s^Jx'^) is 


equal to Eq. 5.4. We shall assume that the term vi 


2V 


Aaih), 



220 MULTI-STAGE SAMPLING Ch. 9 

is small relative to - 1 (as will be the case if the strata do not vary 
widely in size within groups). It follows that 

and since the last term of Eq. 5.5 is positive, vl; tends to be an overstate¬ 
ment of = f 

g h 

Similarly, we can obtain 


y’nn 


and finally 


y 



(5.6) 

. Esl 

(5.7) 

{Ey’f 



From Eq. 5.5, 5.6, and 5.7 we have 

Evl^Evl-\-Evl--2Ev,>y. 

I O L 


1 ^ 




g h 


5 Er 
+ 2 


T 

g 




(5.8) 


Since the sum of the first three terms of Eq. 5.8 is and the last terin 

is positive, Eq. 5.1 tends to be an overstatement of the rel-vanance of 

CC^ 

(2) It remains to show that Eq. 5.4 holds. We consider the variance 
rather than the covariance for simplicity in notation, and the reader can 
follow the same steps to develop an analogous relationship for the 

covariance. ' i j 2 \ 

C 2 V ^<1 F (yr'^ - lyx'i — X + y x'A (5.9) 

Consider the first term in the right-hand member of Eq. 5.9. Since, in 
general, Eu^ = cr^ -b {Euf, it follows that 



5 ESTIMATE OF REL-VARIANCE 221 

Consider next the second term in the right-hand member of Eq. 5.9; 




^gh 


'gh 




^ -L y r' dsH x' 

k - A., \iP'‘A„ 


= I ^ (4. + + I - I ^ 


h A. 


gh 


(5.11) 

and, finally, the third term in the right-hand member of Eq. 5.9 is given 

as follows: ^ ® 


4 fa;; 


h 


+ (5.12) 

Substitute Eq. 5.10, 5.11, and 5.12 into 5.9 and simplify to obtain 5.13. 

~ -(- (r2 ^ 1 + I 22 f <r^, ^ 

' ' \ h A‘> t A, 

(5.13) 


Q T 

E^l- = 2 


where 


Note that 


and 


= 24.. 

h 




? Al 


tA,k<yl, 

h 


Ag Lg 


(1 






(5.14) 


(5.15) 


Substituting Eq. 5.14 and 5.15 into 5.13 and collecting terms, we obtain 

■oq. 3.4, 

h. When the sample is self-weighting, the following relationships hold: 


X 


'gh 


^ghii) ^gh 


gh{i) 


f 


, _ ^ 

X ~-J = J 




222 MULTI-STAGE SAMPLING Ch. 9 

and similarly for the y’s. Substituting these relationships into Eq. 5.3 
above, we obtain 


T I 


gh 


diir 
A ^ 




'x'y' 


m^xy 


m 


*6 Rel-variance for a self-weighting sample in terms of measures of 
homogeneity (Vol. I, Ch. 9, Eq. 17.1). To prow: The rel-variance as 
given by Eq. 1.26 can be restated as follows: 

^2 ^ (6.1 or 

K? = -^ [1 + <5i(e -1)1 + ;;;^ i-9.i7.i) 

where <5, and d, are measures of homogeneity and are defined by Eq. 6.3 
and 6.6, and where and fl are defined by Eq. 6.4 and 6.7, and where, 
in this section q = Eq is the expected number of third-stage units m the 
sample per second-stage unit in the sample, and n ^ En is the expected 
number of second-stage units in the sample per first-stage unit in the 
sample. 

Proof. = = 9 

yi ^ (6.2 or 1-9.14.1) 

’’ m mn Q tnnq 


where the terms in Eq. 6.2 are as defined in Sec. 1, d. 
Consider first By definition, for M large, 





From Eq. 6.3 and 6.4, 

= + 5i(5- 1)] 


Consider next Wl and Wl, By definition, 

wl~^ 

. _ _ _G 

^1 


(6.3 or 1-9.17.6) 
(6.4 or 1-9.17.2) 

(6.5) 


(6.6 or 1-9.17.8) 


* May be deferred. 



Sec. 7 


OPTIMUM ALLOCATION 


223 


= fv! + Wl (6.7 or 1-9.17.7) 

From Eq. 6.6 and 6.7, 

( 6 . 8 ) 

and 

Wl^fld-d,) (6.9) 

Substitute Eq. 6.5, 6.8, and 6.9 into Eq. 6.2 and simplify to obtain Eq. 6.1. 


*7. Optimum allocation for a fixed total expenditure for a self-weighting, 
three-stage stratified design (Vol. I, Ch. 9, Sec. 19 and 20). To prove: 
For a three-stage stratified sampling design which is self-weighting and for 
which also the sampling within second-stage units is constant, the 
optimum w, n, and q subject to a fixed total expenditure are given by the 
iterative formulas 7.3, 7.4, and 7.5 below. The rel-variance for the 
design is given by 


tz2 . Q~ 

~ + I H- ^ 

m mn Q 

where the terms are as defined in Sec. 1, d. 
survey is given by 


mnq 


(7.1 or 1-9.14.1) 


The expected cost of the 


^ — QVw + + Cc^mn + + mnq ( 7 . 2 ) 

The optimum values are given by 


W.. 


a 


(7.3 or 1-9.19.1) 

a == ifc + V^kiC^Vn + + nq + Cl + Ik) 

(7.4 or 1-9.19.2) 


where k = C^/C and a = CJlVm, 


. 1 jc^Vh 

+ + (7.5 or 1-9.19.3) 

Proof. To obtain the values of m, n, and q which minimize the §rror 
subject to a fixed cost, set up the Lagrangian F: 

F ~ -f XiC^Vm -f Cpn + Cg/wn + C^mVn + mnq — C) 


* May be deferred. 






224 MULTI-STAGE SAMPLING Ch. 9 

Then the solution to the equations dFjSm = 0, dF/dn = 0, SF/Sq = 0, 
and the cost equation gives the optimum values. 


_ Q-q Wl ^ ^ / C 
dm ^ m^n Q mHq \2\ 


Ci-t-Can+CaVn+nf I =0 


dn mif Q mfrq \ 2V n 


dq mnq^ 
From Eq. 7.8: 


W" 

.2,^2 — -l_i£ 


I = 0 


From Eq. 7.7: 


w w 


From Eq. 7.9 = 7.10: ______ 

q = -^=^= l-^+C, (7.3) 

Vwl-Wl|Q^j2Vfi 

Multiplying Eq. 7.6 by m and subtracting Eq. 7.7 multiplied by n, we 
obtain 


„ W ^0 1C 

\2Vm ^ 


CoV n 


Substitute for Im^ from Eq. 7.9 and let CJlVT^ = a io obtain « as given 
by Eq. 7.5. Treating the cost function as a quadratic equation in the 

V m, we obtain __—-— 

- C„ + Vcg + 4C(C, + -f C^Vn + nq) 

^ 2(Ci + C^n -f C^-Vn + nq) 

and = a is given by Eq. 7.4 
2vm 

Remark 1. Note that if we use the approximation to given by Eq. 6.1 

with 

fu,, 1; = ,, = 



Sec. 8 SOME SPECIAL RATIO ESTIMATES 

the optimum values are given by 


225 


which is identical with Eq. 7.3 above, where is defined by Eq. 6.6, and 




which is approximately equal to Eq. 7.5, above. 

Of course, a, given by Eq. 7.4, retains the same form since the cost 
lunction IS unchanged. 

Remark 2. If the cost can be expressed by the simplified cost function 
C -= Qm + C^rnii + (7.12 or 1-9.19.5) 

The optimum values are given explicitly by 

/Ei5a 

o V Q V d. C4 


Vwi 


WIvlQ 


(7.13 or 
1-9.19.6) 


/!_£ 

q B aJ Ci q Aj ( 5 , 


-5. C, 


C. 


m 


Cj + C^n + C^nq 


(7.14 or 1-9,19,7) 
(7.15 or 1-9.17.8) 


Remark 3, If the precision of the survey is fixed at - Vt the numbers 
of units in the sample yielding a minimum cost are given by Eq. 7.3 and 
7.5 with a now given by Eq. 7.16 below in place of Eq. 7.4: 


a — 


eCc^ 


V n \ Q I nq 


SCr 


(7.16 or 1-9.20.1) 




d. 


fiq 


with Vx + ~ 2Vxy, and Vxy is the within-strata rel-covariance 

for a simple random sample of listing units, -= Vxx, and V^y = Vyy, 
The development follows the same lines as given above and is left to the 
reader. 


*8. The variance of ratio estimates by specific subclasses that can make 
use of both current and past information (Vol. I, Ch. 9, Sec. 2.1, and Vol. I, 
Ch. 12, Case Study B). Let and be unbiased estimates of X„, and 
T„o, which are aggregate values for the acth subclass of a population. 


* May be deferred. 





and where Vl = E{u- EufKEuf is the rel-variance of u, and = 
E{u — Eu){w — Ew)l{Eu){Ew) is the rel-covariance of u and w, where u 
and w are the various random variables indicated by the subscripts in 
Eq. 8.1. 

The estimate of the variance is obtained by substituting sample estimates 
for each of the terms in Eq. 8.1 above. 


Sec. 9 GAIN FROM STRATIFICATION 227 

The danger of increasing the variance if an estimate of the type given 
by a:' is carried into too many subclasses is discussed in Ch. 5, Sec. 13. 

*9. Reduction in variance due to stratification, when psu’s are large 

(Vol. I, Ch. 9, Sec. 8). To prove: Stratification will usually introduce a 
relatively larger reduction in the primary unit contribution to the variance 
when the psu's are large than when they are small. It is assumed that 
the same strata are used for both large and small psu’s. 

PmoJ, Assume that we have L strata. Let us denote the average size 
of the large clusters within a stratum by and assume that aN.r = 
the average size of the small cluster, where a < 1. 

If we let 


then 


r 


M 


h i 

L 2 {^hi ~ 

fn h 


1 


Mm 


(9.1) 


h 

when = kM^, and m„ = number of psu’s in the sample from the hih 

stratum, = number of psu’s in the population from the /rth stratum 

and ’ 

1^" X S'”). 

M L 


■SS(T-„,- Xf 

SM,(T,-A)n 

MX^ 

MX^ 


If the are large, then 

l^m L 

(9.2) 

But the rel-vanance of a simple random sample of m psu’s is approximately 

MX^ 

and the relative gain due to stratification is 

MX^ / MX^ 


)cy 


(9.3) 


* May be deferred. 





228 


MULTI-STAGE SAMPLING 


Ch. 9 


Assume that in the equations given above we let for the 

number of large clusters in the stratum and = Mi,g for the number of 
small clusters in the hth stratum. Moreover, we let M = for the 
number of large clusters in the population, and M = for th^number 
of small clusters in the population. Then, since 

it follows that Similarly, 

X^g = aXji and Xg = aXj^. Therefore, 

I^M^r.jX^L- ^MUX„s- ^sf 

Mj^Xl MgXl 

or the numerator of Eq. 9.3 remains the same for the change in the average 
size of cluster. From Ch. 6, Eq. 5.6, the denominator of Eq. 9.3 can be 
written as 

^[l + ^iC^i-l)] (9.4) 


for large clusters, and as 

^[i + dgiNg-m ( 9 . 5 ) 

for small sizes of clusters, and the ratio of pq. 9.4 to Eq. 9.5 is 

_ _ (1 — ( 5 ^) 

It follows, if = fl, ^lI^s ^ 


that Eq. 9.6 is less than unity and hence the relative gain from the stratifi¬ 
cation is greater for large clusters than for small clusters. ^ ^ 

The reader can show that in the case where the estimate is x'Iy' = r, 
the relative gain is given by 


mX^ 


MXY 

SS(X„-A)2 

SS(T,,^ Yf 

■■2SS(X,,- X){Yu- Y) 


(9.7) 


MX^ 


and when ^ is defined by Eq. 5.5 of Ch. 6, the relationship in^Eq. 9^6 
holds. It is for the ratio estimate that the assumption that 7% = Vs 
will usually hold. 


*10. Consistent estimates of the components of the rel-variance when 
more than one psu is selected with replacement from each stratum (Vol. I, 
Ch. 9, Sec. 30, 31, and 32). We will consider a three-stage stratified 


* May be deferred. 



Sec. 10 ESTIMATES OF VARIANCE COMPONENTS 229 

sampling design for which the estimate is given by Eq. 1.1 and 1.16 and 
the rel-variance is given by Eq. 1.20. Assume that we have a sample of 
nij^ psu s from the hth stratum, second-stage units from the oth sub¬ 
stratum in the hith psu, and third-stage units in the hiajXh second-stage 
unit, for estimating the components of the rel-variance. 

a. To prove: A consistent estimate of the contribution to the rel-variance 
due to sampling third-stage units, i.e., the third term of Eq. 1.20, is given by 


1 41 I N,„„ < 

f 

^ hi a nf^ia rij^ia j 


V - - V - V tlM2: ilMS V /^2 Qhiaj 9hiaj ^hiai .n 

f m, m'„ t E|, f . 4 7- (10.1) 


where 

with 


Qhiaj ^hiaj 


^hiaj ^hiajX “h ^^^hiajY '^^^hiajXY 

2 ^^hia}k ^hiaj)(^t/fiiajk yhiaj^ 


Q hta) 

I 

k 


^hiajXY 


/ .. 

^Maj ^ 


( 10 . 2 ) 


(10.3) 


'hiaj 


X-u ■ -I 

_ 2 ^hia)k , 

^ ^hiai 




'I'' Vum 

^ yhiaj 


^hiajX ^ hiaj XX ^hiajY — ^hiajYY 

We will show that Eq. 10.1 with a;' replaced by V and replaced by 
^hiaj (defined in Eq. 1.23) is an unbiased estimate of the third-stage 
contribution to the rel-variance. Then we need to show that is a 
consistent estimate of and sl^^j is a consistent estimate of for 
Eq. 10.1 to be a consistent estimate of the third term of Eq. 1.20. ^*' 
^Proof. The expected value of Eq. 10.1 with x' substituted for X and 
substituted for is, by Theorems 6 and 14, Ch. 3 (pp. 49 and 611 
equal to . 


1 1 1 1 N.- Nr.. n ^ e 2 

~ 2- y y £ —— y £ . — M2 — M2 y £ q 2 Qhiaj ^Maj ^hiaj 

r>2 ^ , 2. ^hiajHhiaj y, - - 


1 1 1 I Nr,. Nr,. 1 ^ 

— - \ _V Vp y ^^hta ^^hia 1 _ 

^ ^-^hi n2 ^ ~~t ' Z, 

h ^h ^h i i Phi a ^1r,ir, ft 


pf ^ M yj' AT 2 Qhiaj 

^ht a fihia ^hia 3 ^hia j 


hiaj ^Maj 

Qhiaj ^hiaj ^hiaj 
Qhiaj ^hiaj 

(10.4) 

^ Now, by Sec. 21 of Ch. 4, r is a consistent estimate of R, Since the 
q^iaj units are a simple random sample from the units in the popula¬ 
tion, is an unbiased estimate of (by Sec. 3 , Ch. 4 ). 


= the third term of Eq. 1 .20 





230 


MULTI-STAGE SAMPLING 


Ch. 9 


Similarly, sl^ajx is an unbiased estimate of and is an unbiased 

estimate of Since also a;' is an unbiased estimate of X, then, by 

the corollary to Theorem 19 and Corollary 2 to Theorem 20, Ch. 3 (p. 75), 
Eq. 10.1 is a consistent estimate of the third term of Eq. 1.20. 

b. To prove: A consistent estimate of the contribution to the rel-variance 
due to sampling second-stage units, i.e., the second term ot Eq. 1.20, is 
given by 

^ ^ r>2 ^ 


where 

and 

with 


^ ^ h fn. m. i Pm a 


N 


Jiia 


'ft "'ft i ^ hi 
^hia 

^hiaX + ^^^hiaY ~~ '^^^hiaXY 


^hia 


^hia ^hia 


v2 

‘^hia 


^hiaXY ~ 


^hiaj 


^ / f \ 

2 ^Mai ^hi^^hiaj VniA 
^hia i 


. Q hial 


Qhiaj sr ^ 

t 2-, '^hiajk 

^hiaj ^ 


(10.5) 


( 10 . 6 ) 

(10.7) 

( 10 . 8 ) 

(10.9) 


rt hta ^ 

2! ^hiaj 


^hia = 


^hia 


y'kiai and y'nia similarly defined, and 

slaX = ShiaXX and sh^Y = 4jaFI' 

Qhiaj ~3MM 


and where 


IQ 


■ 

■htai 


J 

^hia = 


'hiaj 


^hiaj 


^hia 


( 10 . 10 ) 


( 10 . 11 ) 


Proof It follows from the same considerations made in Eq. 10.4 
that the expected value of Eq. 10.5 with x' replaced by X and sl^a replaced 
by (defined in Eq. 1.22) is the second term of Eq. 1.20. We wish 
now to show that is a consistent estimate of By i>ec. ^n. o, 

we have 

2 Qhiaj ~ ^hiaj ^hiajXY 
X Qhiaj 7r~ V . . 

_ ^hiaj _ Hhtai 

EniahiaXY ^hiaXY M 

' hta 


( 10 . 12 ) 



Sec. 10 ESTIMATES OF VARIANCE COMPONENTS 
By Theorems 6 and 14, Ch. 3 (pp. 49 and 61), and Sec. 3, Ch. 4, 


231 


^hic^MaXY 


Nhia n _ 

y f)2 ^hiaj Hhiaj ^hiajXY 

^hiaj ^ t 

^ _ ^hiaj ^hiaj 


(10.13) 


Hence, subtracting Eq. 10.13 from Eq. 10.12, it follows that 

^hia^hiaXT ^Ma^MaXY ^hia^hiaXY ~ ^hiaX 


(10.14) 


Similarly, is an unbiased estimate of and is an unbiased 
estimate of Since also r is a consistent estimate of i?, sli^ is a 

consistent estimate of 

c. To prove: A consistent estimate of the contribution to the rebvariance 
due to sampling first-stage units, i.e., the first term of Eq. 1.20, is given by 


where 


1 ^ 1 
I ^ 


si = si- si 


'hxr — ~7 


Z = T^J.x,, 

Cl, 'J 


'Ir — 2rs, 

iXY 


fZ\, 

( 

y'y'hi 

iPni 

y^_ 

i Pm 

rriu r 


m'h 


(10.15) 

(10.16) 

(10.17) 

(10.18) 
(10.19) 


and given by Eq. 10.9. The definition of «/;„ is similar to that of a;!.. 
Finally, si in Eq. 10.16 is 


1 m'n ] Dhi\r 2 r ^ __ f / 

v 2 _ _ y ^ y hta ^^hia ^hia ^2 1 ^hia -2 

/ • m' 

'ft I ^ hi a L 


+ ( 10 . 20 ) 


Proof. From Sec. 4 it follows that 4; | — 4 is a consistent estimate 

, * * 

of Vf the rel-variance of r for a sample of m„, and sampling 

units, i.e., f; 2 is given by Eq. 1.20 with replaced by and 
replaced by 



232 MULTI-STAGE SAMPLING 

1^1.. 

From Part b above it follows that 2 is ^ consistent 

'V* “ 7 


Ch. 9 
estimate of: 


x/^9 2 2 r) 2 ^ 

h i Phi a ^hia 


n 


hia 


1 L } Mh ] ^f‘i Nr.- Nr.- 
1 1 1 JizQ, hzci 


I y_y_y—! 

^ X^tm^iPu a n^iu 


^hia 


Nu 


N: 


hia 


2 Qhiai 

3 


Qhiai ^hiaj ^hiai 


-hiai 


^hiaj 


, i V i V-Lv’ n2 (10.21) 

+ NTl 2 ^ 2 D 2 2 iJhiai n n ^ 

h f^h i hi a Mhiai ^hiaj 


Equation 10.21 can be shown to be equal to the sum of the second and 
third terms of Eq. 1.20 but with «*<„ replaced by and replaced by 
^hiaj' Therefore, a consistent estimate of the total rel-variance minus a 
consistent estimate of the second and third terms combined will be a 
consistent estimate of the contribution to the rel-variance due to sampling 
first-stage units, i.e., 


L 4 J. __lyi 

h m. x'^ h rn. ^ n rrij^ 


Sh 


is a consistent estimate of the first term of which is the same as the 
first term of 


*11. Optimum values for a three-stage stratified design (Vol. I, Ch. 9, 
Sec. 6 and 26). To prove: The values of nij,, the number of psu’s in the 
sample from the hih. stratum, oi f^a ^ nj^JN^ia, the sampling fractions 
for second-stage units, and offmau)^ the sampling fractions for third-stage 


units, with 


fuaw = f^“. Where = 

%i.hia 


Nnia 

2 %iai 
^hia 


(i.e., the sampling fractions for third-stage units are assumed to be 
constant for all second-stage units in the hiaih substratum), which 
minimize the variance subject to a fixed total cost, are given in Eq. 11.7 to 
11.10 below. The cost condition is given in Eq. 11.5 below. It wil l also 

be shown that a uniform sampling fraction is optimum when ^wMal^^uia 
is a constant for all h, /, and a. 

The rel-variance of x'jy' for a general three-stage stratified design with 
varying probabilities of selection of primary units is given by Eq. 1.20 in 
Sec. 1. If we specify that a uniform third-stage sampling fraction be 


* May be deferred. 



Sec. 11 OPTIMUM FOR THREE-STAGE DESIGN 233 

applied to all second-stage units within a substratum of a selected psu, 
i.e., if we let 

tq, 


fhiaij) 

the rel-variance may be written 


ihia 


f 


Mai 


Kw = 


1 r ^ 1 L I I D 

^ L /i h i ^M a 


2 7^“('SL- QuJlua) 

JMa 


+ 2-2^2 


1 


where 


'whia 


h % Phi a fhia Jhiaij) 

Nhia 

2! QMaj^Maj 
.J__ 


Qhi 


Ma^wMa 


(11.1) 


( 11 . 2 ) 


with given by Eq. 1.23, and where 

^lia - + R^8l,y - 2R8j,,,^y (1L3) 

which is also given by Eq. 1.22, 

Mh 2 

~ 2 2 ^Ma^hia ( 11 - 4 ) 

i ^M a 

where 81 is given by Eq. 1.21, and other terms are as defined in Sec. 1. 
Assume the following cost function: 

L L Mn Dm 

C -= + 2^ft l^Phi 2 nhiaC^Ma 

h hia 


L Mh Dm 

2^ft '^Phi 2 ^hia^hia^Zhia 

hia 


(11.5) 


where is the cost per primary unit for the hth primary stratum, C^ma 
is the cost per second-stage unit in the ath substratum of the /th primary 
unit in the hih primary stratum, and is the cost per listing unit for 
the hiaih. substratum, and is the expected number of listing units in 
the sample per second-stage unit in the sample in the a\h substratum of 
the hith psu. 

The cost function may also be written as follows: 

L L Mh Dm 

^ =-■ 2QaWa + 2^/1 %Phi 2 ^hiafhia^ZMa 

h hia 

L Mh Dm 

+ 2^ft ^Phi 2 ^MafMaQhiafhiaij) ^3hia 
hia 


( 11 . 6 ) 



234 MULTI-STAGE SAMPLING Ch. 9 

The optimum values subject to a fixed total cost are given as follows. 

VIV r.. 


f ==-L 1^ 

Jhia p Nr' 

^ hi 

fhia(j) 




Qhia^' 


whia 


2hia 


A. 




02 

^whia 


where 


^SMa ^hia Qhia^whia 

1 /X _ LMj^Dhi 

Va = MI + 222 

C \ h h i a 


(11.7) 

( 11 . 8 ) 

(11.9) 


Qhia ^whia 


^2hia 


L Mh Dm ^ 

2 2 2 Qhia^whia 

h i a 




( 11 . 10 ) 


Proof. To obtain the optimum values we set up the Lagrangian F\ 

F = + A (Cost function - C) 

Then the solution of the equations ^Fjdmj, = 0, dFjdf^ia = 0, ^Fldf^iaU) 
0, and the cost condition, Eq. 11.6, yields the optimum values given in 
Eq. 11.7 to 11.10. The process of obtaining the solution, using the 
Lagrange multipliers, is illustrated in Sec. 7 above and in Ch. 5, Sec. 9, 
and is left to the reader. 

Note that the optimum over-all sampling fraction is a uniform over-all 
sampling fraction when is constant, since for a uniform over-all 

sampling fraction 

f ■ ^h^hifhictfhiaii) 

and, substituting the optimum values for mj,,ff,ia, and ff,iau), we obtain 

/- S. 

Vxf= 


^wMa 


■Kfr. 


12. Adjustment for changes in probabilities when initial sample is selected 
with varying probabilities.* Given a population classified into strata; 
one unit is selected from each stratum with a specified probability. It is 
desired to determine a method for drawing a sample of units with probabili¬ 
ties differing from the original probabilities of selection, but still retaining 
a maximum number of the originally selected units in the sample. The 
method is applicable, for example, when probability proportionate to a 


• This result is due to Keyfitz (3). 



Sec. 12 ADJUSTMENT FOR CHANGES IN PROBABILITIES 235 

measure of size has been used in the initial selection of a sample, the 
measures of size have since been brought up to date, and it is desired to 
redraw the sample, using the more recent measures of size, and still 
retain as much as possible of the original sample. 

Illustration of method: Consider one of the strata, and assume that the 
population consists of units A, B, C, and D with original probabilities 
of selection equal to a, y, and d and new probabilities equal to a, b, c, 
and d. Assume that « > a, Zj > ^, c < y, and ^/ < (3. In this case’ if 
either A or B was chosen originally, it would be retained in the sample. 
However, if C or Z) was originally selected, some chance of rejecting it 
must be introduced. The appropriate probability of rejecting C or D 
would be {y — c)ly^ or (6 — d)/d^ respectively. 

Suppose that C is the one which was originally selected. We may then 
determine from a table of random numbers whether or not to reject it, 
by selecting a random number between 0 and 1. C is retained in the sample 
if the random number .is between 0 and (y - c)ly. If the random number 
is greater than (y — c)/y the original sample selection is retained. 

If we have determined that C (or D) is to be rejected, our next problem 
is to choose between A and B. The choice is made by selecting another 
random number between 0 and 1 and determining whether or not it is 
between 0 and {a oc)/(f 7 <x b — f). If so, A is selected; if not, B is 
selected. 

The proof that this method yields a sample with probabilities of selection 
proportionate to a, b, c, d, and that it results in the minimum probability 
of change from the original sample, is left to the reader. The extension to 
any number of units in the stratum is immediate. 


REFERENCES 

(1) M. H. Hansen and W. N. Hurwitz, “On the Theory of Sampling from 
Finite Populations,” Annals Math. Stat., 14 (1943), 332-362. 

(2) Emil H. Jebe, “Estimation for Subsampling Designs Employing the County 
as a Primary Sampling Unit,” J. Amer. Stat. Assn., 47 (1952), 49-70. 

(3) Nathan Keyfitz, “Sampling with Probabilities Proportional to Size: Adjust¬ 
ment for Changes in the Probabilities,” J. Amer. Stat. Assn., 46 (1951), 



CHAPTER 10 


Estimating Variances 


DERIVATIONS, PROOFS, AND SOME EXTENSIONS OF 
THEORY FOR CH. 10 OF VOL. I* 

Note. This chapter contains some relationships (in addition to those given 
in Ch. 4) useful for evaluating the precision of variance estimates. Some 
methods are also given for simplifying the estimation of variances from sample 
returns. Additional methods are given in Ch. 10 of Vol. I. 

1. The rel-variance of the estimate of the rel-variance and of the coefficient 
of variation with simple random sampling (Vol. I, Ch. 10, Sec. 5). a. The 
rel-variance of the estimated rel-variance based on a simple random 
sample of n units drawn with replacement from a population of N units 
is 

n ^ A Tr2 A .. I V3T/2 

( 1 . 1 ) 


where 


V% 


4V% 4fiJX^V\. 


n 

n n 


s‘^ 

n 

2 *.- 

V — 


n 


n 

i(x, - xf 
n- 1 ’ 

1 ! 


N 

N 

0 


liX-x) 

0-2 = 

N 

’ Mi 



N 

vl 

a\NliN- 

1)] 

^ X 

X^ 

’ N 


Ms 


N 


xf 


N 


* Appropriate references to Vol. I are shown in parentheses after section or 
subsection headings. The number following I- after some equations gives 
the chapter, section, and number of that particular equation in Vol. I. 

236 




Sec. 2 VARIANCE OF ESTIMATED VARIANCE 237 

and where is the value for the rth unit in the sample, and V; is the value 
for the /th unit in the population. 

The proof follows the same steps outlined in Sec. 18 of Ch. 4 for 

deriving the rel-variance of the estimated variance of a ratio, and is left 
to the reader. 

b. The rel-variance of the estimated coefficient of variation based on a 
Simple random sample drawn with replacement is 


An n n 4 


(1.2 or 1-10.5.1) 


This follows immediately from Part a and from Sec. 6, Ch. 4. 

Sampling with replacement is assumed here for simplicity and will be 
a good approximation for sampling without replacement whenever the 
number of sampling units in the population is large relative to the number 
in the sample. 


2, The rel-variance of the estimated variance for a stratified random 
sample (Vol. I, Ch. 10, Sec. 7 and 9). To prove: The rel-variance, 

of the estimated variance, with a stratified random sample of n = Tn, 
units when is large relative to n^, is t 


where 




(2.1 or 1-10.7.3) 





with 

4 = 

.2 i_ . 

■5/1 — , 

(2.2) 

and 

(2.3 or 1-5.14.1) 


il 

(2.4 or 1-10.7.2) 

with 




SI - ' 

Nu- 1 

(2.5 or 1-5.1.2) 





ESTIMATING VARIANCES 


Ch. 10 


ft ^ 


and is the value of the ith unit in the sample from the hth stratum, 
and x'j,i is the value of the hith unit in the population. The represent 
the number of units in the sample from the Nj, units in the hih stratum. 


Proof. By definition. 


{Eslf 

By Theorem 6, Ch. 3 (p. 49), and Sec. 4, Ch. 4, 

Esl = INI-SI = SI 

h ffh 

By Corollary 1 to Theorem 11, Ch. 3 (p. 56), 

h 

and by Sec. 5, Ch. 4, with N„ large relative to n,„ 


- 3 


i 1 « 

h ff-h 




Substituting Eq. 2.7 and 2.9 into 2.6, we obtain Eq. 2.1. It follows that 

]/2 _ where x = x'/N 

Remark 1. If proportionate stratified sampling is used, i.e., ff, = nniNj^ 

= njN = /, si' becomes 

4 . = y where 4, = ^ (2-10) 

and Vl^^, = and with the substitution/;, = f, we have 

SN, k- 

' “ NnS% 


(2.11 or 1-10.7.6) 



Sec. 3 
where 


OPTIMUM ALLOCATION 


239 


AT 


Remark 2. If proportionate stratified sampling is used and also the 
strata are of equal size, i.e., Nf^ ~ N, and hence also nf^ = fi, and the 
within-stratum variances are all the same, i.e,, for all h, then 

8% = 8^, N -= N/L, and Eq. 2.11 above becomes 


where 




(2.12 or 1-10.9.1) 


Remark 3. If proportionate stratified sampling is used and also the 
are the same for all strata, sayfor all A, then Eq. 2.11 above 
becomes 


and since 


we may write 


y2 

n NS% 

, _ I.N,SI Y 
N ) 

N } 

NS% 

+ Kk) 


8 


yk = 


N 


(2.13) 


(2.14 or 1-10.9.2) 


*3. Optimum allocation to strata of a subsample for estimating the 
variance of a stratified random sample (Vol. I, Ch. 10, Sec. 9). Suppose 

that the original sample consists of a stratified random sample of « = ^rij, 

sampling units with variance 8l, = iNl(Sl/n^X where 81 is given'by 

h 

Eq. 2.5. Suppose also that we wish to estimate the variance 8^^, from a 

L 

subsample, n = 2^5^, of the original sample. 

h 

To prove: The variance of the estimated variance, a%^„ will be close to 
the minimum when the subsample is allocated as follows: 


h 


(3.1 or 1-10.9.3) 


* May be deferred. 




240 

where 


ESTIMATING VARIANCES 


Ch. 10 



4, The rel-variance of an estimated variance based on random group 

totals (VoL I, Ch. 10, Eq. 16.3). To prove: a. An unbiased estimate of the 
population variance based on a simple random sample of n units drawn 
without replacement from a population of N units may be obtained by 
subdividing the n = tk units in the sample into t random groups of k units 
each and computing si, based on the variance in the t random group totals, 
i.e., 

Esl = 

where 

sl = lix,-x'flk{t-l) ( 4 - 1 ) 

Q 

» . = the value of the /th observation in the sample in the gth group 

Ql 

k 

X = is the total value of the characteristic for the k 

i 

observations in the ^th group 


Sec. 4 


RANDOM GROUP METHOD 


24] 


= 2x„lt 




N~ 1 


and where is the value for the /th unit in the population and 

x=i:xjN. 

b. The rel-variance of si is given by Eq. 4.2 whenever the sampling is 
with replacement or the sampling is without replacement and the number 
of units in the population is large relative to the number in the sample. 


where 




f + 3 ^ 
k k 


(4.2 or I-10.I6.3) 
(4.3) 


o4 






N 


and 




N 


Proof, a. Consider first £?|: 


Esl 




(4.4) 


Since a:„ is a total from a simple random sample of k units (from Sec 2 
Ch. 4), t , 

and since x' -= kx, where x == Sxjw, 

Ex'^ -= -f k^ 

N n 

It follows, by substituting these results in 4.4, that 

(4.5) 

Similarly, if the sampling had been with replacement, we would have 


N-nS^ 





ESTIMATING VARIANCES 


Ch. 10 


b. Consider next when the sampling has been done with replace¬ 
ment. By definition, 


^2 _4,. Es\-{Est)^ 


We may also write s\ as 


where 


{Eslf (Esff 




x' _ ^Xi 

- = X = 2 ~ 
k i n 


Make the following transformation: 


^gi ^ ^gi 


Xg— X 


X— X = z = 




- (7^'"[F--**']’ 


IV 

The first term in Eq. 4.10 becomes 


\ g / (f 


Ik \ 4 

/ Vz \ 

I I Z-i^gt 1 

£ 2 \V 


gi \ t \ ^ St 


k \ 2 /k \ 2 

/2%a 




which by Sec. 5, Ch. 4, 

>4 3 (^- 


(4.10) 






Sec. 5 VARIANCE OF COMPONENTS 

The second term in Eq. 4.10 becomes 


243 




ItE 





which by Eq. 4.11 above 








+ 2 (^- 1 ) 


k^ 


(4.12) 


The third term in Eq. 4.10 becomes 

EtH^ = 1 £ A 1 iT [y + 3 i zpt) 

t \g gi-h J 

which by Sec. 5, Ch, 4, 


=f.{' 


//4 3{k — l)^^ 

+ 


+ 1 ) 


(4.13) 


p ' ^.3 j . -- V - -/^,2 

Combining Eq. 4.11, 4.12, and 4.13 into 4.10 and simplifying, we obtain 




/^4 Xk— 1 ) 


ct4 + 


2f + 3 


kt ' kt ' {t~ l)t 

Substituting Eq. 4.6 and 4.14 into 4.7, we obtain 


ct 4 


(4.14) 


V%. 


1 ki 3(k— 1) 3 

t Ik k ^ — 1 


y 3(k~ 1) t~3' 

t L/C 




k 

t-3 


t~ 1 


(4.2) 


Exercise 4.1. By the procedure indicated in Sec. 2, extend vy to a stratified 
sampling case where random groups are set up in each stratum. 


*5. The variance of the estimates of the components of the variance 
in a two-stage sample design with ^elementary units in the sample from 
each first-stage unit in the sample, N elementary units in the population in 
each first-stage unit, and the sampling carried out with replacement at each 
stage of sampling (or small sampling fractions) (Vol. I, Ch. 10, Sec. 6). 


* May be deferred. 







ESTIMATING VARIANCES 


Ch. 10 


a. To prove: The variance of an estimate of the between component of 
the variance is 

.m — 3 


°»m-\ Aa\a\, loi mn-\ 4or^,„. 

---— -h -r: + ——r::"' 7T “h 

m 


n(m — 1) mn^ {it — l)(m — 1) mn 


where si — -— is an unbiased estimate of 

m— I n 


M- 1 

M 


M~ 1 ’ m(/i- 1) 


„ _ lSXi:zIl. „ _ p,,y _ j.., 2, 

/^46 — — TYyX^ X) (Xj 


.*.22^; ..d 


Z nx„- W 

MN 


The proof is left to the reader. 
Hint: Let 


where 

Then 


Xij = X + <3^ + ^ii 


di = Xi - X, and A,-,. = a;,,. - X^ 


I(^i - xf 2 
w— 1 nm{n— 1) 


m—l 


where 


m 

2 2(5.- 

-5)(A,-A) 
m—l 

m n 

2 IK 

_[_ ±jX1 _ 

mn{n — 

K 

m 

2 AjAj 

i¥=j 

m(m— 1) 

m 

2'5. 

J> 


m 

IXt 


i 

A,- = - -, 

and A === 

i 


m' 

n 


m 




,m 

- 3 


' m 

m- 


" 

m 

- 1 


. m — 

1 "j 

m 





Sec. 5 
Also 


VARIANCE OF COMPONENTS 


245 


^ ~ 1 m m(m — 1) 


2aa. 

m m(m ~ 1) 


2 2 a.,a„ 

i 3^1 


+ —£M_ 

n mn{m — 1) 


mn{n—\)\ Tnn{n—\) 


2 r> 4 

£ 

lm{m - 1)J m{m - \)n^ 


b. The variance of an estimate of the within component of the 


variance is 




5.1. Prove that 


Exercises 


4 rn — J 

M'Sb “ ^0 I A r ^ -I 

—_ — ^ ^ _j_ /^4w __ 1 — 2n) —^ 3 I 

w mn^ mh^ [ m— \ J 

n{m - I) mn mh^ 

where 

m 

2 ^ 2(a- - 
' m- 1 

Show that, when « = 1, K|=. = <T|.,/(&?f reduces to the rel-variance of 

2 rel-variance is given by Eq. 5.1 of Ch 4 

5.2. Show that the covariance of and siJfi is 


CTs^. .s2. In ~ 


/^4w _ 0> 


mn^ mn^ mn^ ‘ mh 


5.3. For a simple random sample of « elements show that 

/^4 = /^4w + 4F(3,/i,-3 + 6F^fcrf + 

= al + lalal + at 





246 ESTIMATING VARIANCES Ch. 10 

*6. Use of known stratum means and totals in estimating variances (Vol. I, 
Ch. 10, Sec. 19). a. To prove: When the stratum means are known, an 
unbiased estimate of Si, the average within-stratum variance for a pro¬ 
portionate stratified random sample of « = 2^/, units, is 

Uh _ 

- (6.1 or 1-10.19,1) 

where rij^ is the number of units selected from the hih stratum, and 

l^nSl 

^2 _ I - (6.2 or I-5.3.8) 


where SI is defined by Eq. 2.5. 

Proof. By Theorem 6, Ch. 3 (p. 49), 

1 L ^2 ] vh 


A t V, - 1 




Jl ^ 


Since Xj^ is known, the probability of obtaining Xf^Y is the prob¬ 
ability of obtaining namely, Therefore, 


1 

EiXjii — Xf^Y — -Tj- fSXj^i 


Xjd^ 


and 


4 


A. 




2(X,„- X,f 


i 



b. To prove: When the stratum totals are known, an unbiased estimate 
of the between-psu contribution to the variance for a multi-stage 
stratified sampling design in which primary units are selected with 
replacement from the Wh stratum, is: 


^2 _1_ 2 T 2 (— — ^3 (6.3 or 1-10.19.2) 

LX^hmni\PM ' 


where is the probability of selecting the (th primary unit from the hth 
stratum, l 



* Ma) be deferred. 



Sec. 7 CONFIDENCE LIMITS FOR MEDIAN 247 

is the total of the ^'-characteristic in the /th psu of the hVci stratum, 





(6,4) 


Proof. By Theorem 6, Ch. 3 (p. 49), 

Since is known, the probability of obtaining — X^ 
probability of obtaining X'^^, namely, Therefore, 


is the 


and 





2 


Eb\ = B% 


7, Confidence limits for the median and other position measures* (VoL I, 
Ch. 10, Sec. 18). Let and Xj^ht the estimated Aih and ^th percentiles 
of a distribution derived from a sample where A is less than 50 per cent 
and B is greater than 50 per cent. Let p be the percentage of items in the 
sample which are less than 0, the true median. 

To prove: The Xj_ and Xj^ constitute confidence limits for 0 with the 
following probability: 

Pr{xj^ <B <Xj^)^^ Pr{A < p < B) 

Proof. Let us assume that a given population is arrayed by size. The 
point 0 (the true median) divides the population into two equal parts.f 
Let us assume further that a sample is drawn from this population, and 
by proper weighting of each sample item a reflection of the original 
population is obtained and ordered by size. This is illustrated graphically 
by the cumulative frequency curves in Fig. 1. The solid line is the cumu¬ 
lative frequency curve of the population; the dashed line represents the 
cumulative frequency curve derived from a sample. 

Now let us draw two arbitrary horizontal lines across the graph to 
represent the limits A and B. Then, since these curves are nondecreasing, 
0 is within the limits x_^ and Xj^ if and only if p is within the limits A and B. 

* Proof developed by Ralph S. Woodruff (4). 

t Note that this condition is not met where more than a negligible proportion 
of the population has a value exactly equal to 0. 



248 


ESTIMATING VARIANCES 


Ch. 10 


Symbolically, this can be expressed as 

Pr{x^ <d <Xj^)^ Pr{A <p < BY 

Since it is often possible to make meaningful statements about probabilities 
on the right-hand side of this equation, the limits on the left-hand side 
constitute usable confidence limits for the median. For example, if 
simple random sampling is used, the probability of p falling within the 
limits A and B can be calculated by summing appropriate terms of the 
binomial distribution (either directly or by means of tables of the 
incomplete ^ function). More generally, if large samples are used (with 



any type of probability sample design), the distribution of p will often be 
near normal and a usable estimate of can frequently be derived from 
the sample.f In this case we can obtain meaningful confidence limits for 


* For the application of this principle with simple random sampling, see, for 
example, “On Confidence Ranges for the Median and Other Expectation Distri¬ 
butions for Population of Unknown Distribution Form,” by W. R. Thompson, 
Annals Math. Stat., 1 (1936), 122-128; “Order Statistics,” by S. S. Wilks, 
Bull Amer. Math. Soc., 54 (1948), No. 1, p. 14; and A. M. Mood, Introduction 
to the Theory of Statistics, McGraw-Hill, 1950, pp. 388-389. 

t Theoretically crj should be the variance of the percentage of items less than 
d, the true median. Since in practical problems d is not known, the value 
of must be estimated as the variance of the percentage of sample items 
less than O', the median derived from the sample distribution. Where large 
samples are used, the substitution of 6 ' for 6 has little effect on the estimate 
of (tJ. 


Sec. 7 


CONFIDENCE LIMITS FOR MEDIAN 249 

the median by choosing our arbitrary limits A and S so that A = 50 per 
cent minus Ka^, and B ~ 50 per cent plus Ka^, (K being any positive 
number). 

The method can be applied directly to other position measures as well 
as the median. 




(1) U.S. Bureau of the Census, Sampling Staff, .4 Chapter in Population Sampling 
U.S. Government Printing Office, Washington, D.C., 1947. 

(2) U.S. Bureau of the Census, Notes on Precision of Sample Estimates 

(Reprint of Appendix Section of the Special Study: Value of Farm Products 
by Color and Tenure of Farm Operators), U.S. Government Printing Office 
Washington, D.C., 1945. ^ ’ 

(3) M. H. Hansen and W. N. Hurwitz, “Relative Efficiencies of Various 
Sampling Units m Population Inquiries,” /. Amer. Stat. Assn., 37 (1942), 

(4) Ralph S. Woodruff, “Confidence Intervals for Medians and Other Position 
Measures,” /. Amer. Stat. Assn., 47 (1952), 635-646. 



CHAPTER 11 


Regression Estimates, Double Sampling, 
Sampling for Time Series, and 
Other Sampling Methods 


DERIVATIONS, PROOFS, AND SOME EXTENSIONS OF 
THEORY FOR CH. 11 OF VOL. I* 


Note. Many of the developments presented in this chapter can be applied 
as variants or extensions of the methods already introduced. Included among 
the topics covered are alternative methods of estimation; double sampling 
(often referred to as two-phase or multi-phase sampling); techniques for selec¬ 
tion of sample units that introduce negative correlations in the selections between 
strata and thus have the effect of extending the depth of stratification; an 
approach for estimation of characteristics of each unit in a population from a 
sample of such units; and sampling for time series. 


1. The difference and regression estimates (Vol. I, Ch. 11, Eq. 2.1, 2.7, 

2 2 2 10) a. The difference estimate. To prove: The difference 

estimate, Eq. 1.1, is an unbiased estimate of V, and its variance is given 

byEq. 1.2. ^ . 

Proof. Let x' and f be random variables that are unbiased estimates 

of X and T, and let k be any arbitrary constant. Then the “difference 


estimate” of X is 


Ar k{Y— y') 


(1.1 or 1-11.2.1) 


Since x' and f are unbiased estimates of X and Y , 
Exl - Ex' -f k{Y- Ey') = X 


and hence the difference estimate is an unbiased estimate of X. 


* Appropriate references to Vol. I are shown in parentheses after section 
or subsection headings. The number following I- after some equations gives 
the chapter, section, and number of that particular equation in Vol. 1. 

250 



Sec. 1 THE DIFFERENCE AND REGRESSION ESTIMATES 251 

The variance of is 

af = £{x'^ + 2kx'( r- 2 /') + k\Y- y’f) - 

= cr|, + - 2kp^.y.a^,a^, ( 1.2 or I-l 1 . 2 . 7 ) 

b. The regression estimate. To prove: The regression estimate, Eq. 1 . 4 , 
is a consistent estimate, and its variance is 

a\^ alii-ply) (1.3) 

when y’, and b are consistent estimates of X, Y, and ;S, respectively. 
The values of X, Y, and /?, defined below, can be estimated from sample 
designs described in this book, such as stratified sampling, cluster sampling, 
and multi-stage sampling. In each case, one should investigate the 
precision of the approximation of Eq. 1.3. 

Proof. Let 

xl=^x' (1.4 or 1-11.2.2) 

and let 

KY-y') 

Since x', y\ and b are consistent estimates of Z, F, and ft it follows from 
Corollary 2, Theorem 20, Ch. 3, that x^ is a consistent estimate of X. To 
obtain the variance of x^ we can write 

^^2 — ^2 + — /5)( 7— 7/') 

so that 

E{xl - Xf = 4 + 2E{&1 -X){Y-y')(b - /?) 

+ E{b - ^)\Y — y'f (1,5) 

The variance of a?" is given by substituting f) = p^^aja , for /c in Eq. 1.2 
and is equal to 

4 = ttlil - ply.) ( 1 . 6 ) 

where 


If Xf is to be approximated by erf. = 0 % the remainder term is 

2E(&; -X)(Y- y')(b ~^) + E(b - /3)2( 7 - y^ (]. 7 ) 

and it remains to show that, for a sufficiently large sample size, these 
terms are negligible compared to al Thus, if a' is an estimate of the 
total X from a simple random sample of size «, a\ is of the order 1/n. 




252 OTHER SAMPLING METHODS Ch. 11 

By the inequality given on p. 56, Ch. 3, 

X){Y-y’){h- 

< [E{b - ^fE{&l - Xf(Y- 

< {E{b - - XfpiE(Y- yyf' 

Since and y' are arithmetic means, it follows that 

E0cl - xy = 0(n-2) 

E(Y-yy = 0(n-^) 

where 0(«“^) indicates that the term is of the order of l/«^ in n so that 

[E{&1 - A')*]’'- = 0(«-'0 
[E(Y-~yyV' = 0(n-’/») 

Also, since 6 is a ratio estimate, we have 


E{b - ( 3)2 <t|„ ,4, 


4 r 4 ^xy^A' 


+ 0 («- 2 ) 


Since 


and 

<ys„ = o(«-'0, = 0(4'“) 

we have 




Thus 



E{b - = 0(n-i) 

and this analysis would indicate that 

E(&'^ -X)(Y- y’Xb - j3) is 0(n-'O 

However, since products and ratios of arithmetic means must have integer 
orders, it follows that this term is at least of order rr'^, and this term will 
be small relative to 4- for sufficiently large n. 

Also ^ 

Eib - m Y- y'f ^ \-Eib - ( 3 )‘£( y- y’f] - 

= [0(n-^2)0(„-2)]'/* 

= 0 (n- 2 ) 

and we have the second term of the remainder (Eq. 1.7); 

E{b - Y - y’f of order 0(n-2) 





Sec. 2 
Then 


ESTIMATE OF REGRESSION COEFFICIENT 253 


<• = 4 + o(«-2) 


( 1 . 8 ) 


krge remainder term will be small relative to 4 for sufficiently 

2. A consistent estimate of the regression coefficient, /? (Vol. I, Ch, 11 

Eq 2.6, 2.15, and 2.16). Let the regression coefficient of x’ on v' be 
defined as follows: 


or for stratified sampling as: 


^ E{x’ - X){y' - Y) 
Eiy'~ Yf 


£ 2 ( 2 /;- Y^f 


( 2 . 1 ) 


( 2 . 2 ) 


Then, a consistent estimate of p is given by the ratio of consistent estimates 
01 the numerator and of the denominator of Eq, 2.1 or 2.2. 

In the special case where 

X, • 

2 ^ hi 

p 


and 


then 


Xr 


Vh 


2 ^ hi 

p 

^ ^hi 

m. 


L 

2 

h 


(2.3) 


1 ) 

is an unbiased and consistent estimate of the numerator of Eq. 2.2, and 

2 ^-2/J 


ruk 

L 2 

2 - 
h m,J[m 


1 ) 


(2.4) 


is an unbiased and consistent estimate of the denominator of Eq. 2.2 
Therefore 6 is a consistent estimate of /S, where 


b = 


Eq. 2.3 
Eq. 2.4 


(2.5) 




254 


OTHER SAMPLING METHODS 


Ch. 11 


3 Variance and optimum allocation for double sampling with regression 
estimates (Vol. I, Ch. 11, Eq. 3.1-3.5). To prove: Suppose that an 
estimate of X is wanted from a sample, that y is the average value of 
some related characteristic obtained at low unit costs from a large sample 
of size n selected by simple random sampling, and that is the sample 
average of the V’s obtained from a smaller sample of size n\ also obtained 
by simple random sampling, which is a subsample of the larger sample, of 
size n. Similarly, y' is the sample mean of the T’s for this same subsample. 
Then an estimate of V, using the regression estimate with simple random 
sampling, is 


g' = *' + b{y - f) 


(3.1 or 1-11.3.1) 


where b is the estimate of (3, the coefficient of regression of * on y, i.e., 

n' 

■ _ - y ) (3.2 or 1-11.3.2) 

^ n' 

iSHi - y'f 

a. The variance. The variance of x" is given approximately by 

where p is the coefficient of correlation between X^ and 
Proof. By Sec. 1, Ch. 11, for sufficiently large n\ the variance of 
estimate 3.1 is given approximately by the variance of 




_L fi(<ri — 




The expected value of estimate 3.4 is 

E{E(x'’\[ri\)] ^ X (3.5) 

and hence estimate 3.4 is unbiased. , 

The variance of estimate 3.4 is derived from Theorem 15 of Ch. 3 
(p. 65), which, as applied to this problem, states that 

< 4 " = 

The variance of x' is equal to the variance of the expected value of x 
for a fixed sample of n units plus the expected value of the conditional 
variance of x" for a fixed sample of n units. The first term on the right- 
hand side of Eq. 3.6 is the variance of » = E{g\[n\) and is, assuming 
sampling with replacement, 


(3.7) 



3 DOUBLE SAMPLING 

The second term of Eq. 3.6 is 


255 


Fre^ 








Then Eq. 3.6 becomes 


4- 


nn 


-SI. 


('- 7 ) 


(3.8) 


(3.3) 


which is approximately equal to the variance of Eq. 3.1. 

{unction-'”"'^ a//ocatm«/or a simple cost function. Assume a simple cost 

C = C^n + Cgn' p 9^ 

where Q is the unit cost of including an element in the large sample C, 
IS the unit cost of including an element in the small sample, and C is’the 
total cost, varying with the number of elements in the sample- C is 
assumed to be considerably smaller than Q for the design to have 
practical significance. ^ 

To prove: The optimum values of n and n' such that the variance 
Eq. 3.3, IS minimized subject to the fixed cost, C, are; 


opt. n 




l — C-^ 


and 


opt.„'=„yi^^£ 

Proof. Define the Lagrangian function; 


(3.10 or 1-11,3.4) 


(3.11 or 1-11.3,5) 


An expression for n in terms of VX is obtained from setting dPj^n = 0, 
and an expression for n' in terms of Vll from dFjdn' = 0. Substituting 
these expressions for n and n’ in Eq. 3.9 yields the expression for VJ 

Eq TlO and?iT"‘^ ” ' ‘‘ 




OTHER SAMPLING METHODS 


Ch. 11 


4 Condirion for cost and variance of single and double sampling designs 
to be equivalent (Vol. I, Ch. 11, Eq. 3.6). To prove: The optimum 
double sampling design with the regression estimate will have the same 
cost and variance as for a single (simple random) sample design with a 
simple unbiased estimate when 

„2 _(4.1 or 1-11.3.6) 

' (Cl + C^f 

Proof. The variance of the double sampling design having the minimum 
variance subject to a fixed cost, C, is expressed by substituting the optimum 
values of n and n' from Eq. 3.10 and 3.11 into the expression for the 
variance, Eq. 3.3. Under these conditions 3.3 becomes: 


4' (opL) 


[pVC, 


P^)C,Y 


where the cost function is presumed to be _ 

/l - Cl ,, 

C = CiU + C^n' = Cpi + 

The variance of a single (simple random) sample of, say, m units is 


The cost of such a single sample is assumed to bi 


C = Com 


From Eq. 4.4 and 4.5, we have 


We have now to find the condition under which the variances 4.2 and 4.6 
are equal when their total costs 4.3 and 4.5 are equal. It follows that 
the variances will be equal with equal total costs when Eq 4.1 holds. 
If we require now that only the costs of these two designs be the same 
and determine the condition for which the optimum double sampling 
design has a smaller variance than the single sample, we have from 
Eq. 4.2 and 4.6 


4-4 = f [pVQ-t 

It follows that this inequality holds for 

„. 4C1C2 


p2)CJ^ > 0 


Sec. 5 


DOUBLE SAMPLING 


257 


Therefore, for values of/> that satisfy this inequality, the double sampling 
design shows a gain over the single sampling design. 

5. Variance and optimum allocation for double sampling with stratiflca- 

u It desired to estimate a 

total characteristic, X, from a population of N units, where of these 
units have a particular characteristic, Z, and the remaining N„ = N~ A, 
units have the characteristic W (i.e., they are non-Z's), and it is assumed 
that stratification into the Z and W groups would be advantageous. 
( or example, the units might be farms or business establishments, and 
the charactenstic Z might apply to those establishments larger than some 
s^cified size; or the distinction between the Z’s and W’s might be that 
the Z s responded to a mailed questionnaire within a specified period of 
time, and the W’s did not, etc.) 

Using a double sampling approach, an initial large simple random 
sample of n units is drawn from the population of N. In the n units, 

are found to possess attribute Z, and from the remaining n, units 
having attribute W a simple random subsample of n' units is selected. 

The total sample size is 

a. The estimate. To prove: If is the aggregate of a characteristic for 
fte Hi units in the sample with attribute Z and x, for the 4 units ultimately 
included in the sample with attribute W, then an unbiased estimate of Vis 


, k 


(5.1 or 1-11.3.8) 


where/= n/A is the sampling fraction for the initial large sample of units, 
and k — is the reciprocal of the subsampling fraction. 

Proof. Estimate 5.1 may also be written 

, A^ An,/= 

ft i 

where represents the characteristic for the ith unit in the sample with 
attribute Z, and X2i for the /th unit in the subsample with attribute W. 

The conditional expected value of the estimate 5.1 for a fixed sample 
size of units from a fixed set of iig units is 

E(x'\„,n,) = ^lx,, + ^”ix,, 

n i n i 

= n F-' (5.2) 

where represents the characteristic in the ith unit in the initial sample 






258 


OTHER SAMPLING METHODS 


Ch. 11 


regardless of whether it has attribute Z or W. The expected value of 
Eq. 5.2is E[Eix'\n,,4)]=^Ex' = X 

and estimate 5.1 is therefore unbiased. 
b. Variance. To prove: The variance of estimate 5.1 is given by 

8^+ -(k- l)iV 2 S| (5.3 or I-l 1.3.9) 
Nn n 


al 


where 


with 






X 


N- 1 


and 


with 


SI 


N 

j_ 

N 


N. 

2 (^ 2 . 

i 


X.f 


No - 1 


^2 


Xo = 


No 


Proof. The variance, Eq. 5.3, is derived by use of Theorem 15 of 
Ch. 3 (p. 65), which, as applied to this problem, states 




(5.4) 


the variance of x' is equal to the variance of the expected value of x' for 
a fixed number, of units from a fixed sample of ^2 units having attribute 
W plus the expected value of the conditional variance of x' for a fixed 
number, of units from a fixed sample of n 2 units having attribute W. 

The first term on the right of Eq. 5.4 is merely the variance of estimate 
5 .2, which is ^ 


N2 


Nn 




(5.5) 


The conditional variance of 5.1 for a fixed number, /? 2 > of units from a 
fixed set of n 2 units having IE is ^ 


f 


(5.6) 



Sec. 5 
where 


DOUBLE SAMPLING 


259 




^2=:^- 

«2 

The expected value of Eq. 5.6 for a fixed number, n^, of W units is 


rhjk— 1 ) 

P 


-s-l 


(5.7) 


There IS still the condition on formula 5.7, however, that it is the variance 
xed sample size of n^, so that the expected value of 5.7, which is 
the second term on the right of Eq. 5.4, is 


r 2 in 

= ~(k~l)]V^SI (5,g) 

5.5, and 5.8, we have the variance of estimate 5.1 as given 
allocation of sample sizes for minimum cost subject to a 
(1) The following simple cost relationship is assumed; 

C = C„n + Qn, -t- (5.9 or 1-11.3.11) 

where C is the total cost of the survey less any fixed overhead costs that 
do not vary with the allocation of the sample. 

Co IS the unit cost of selecting and examining a unit included in the 

large sample and determining whether it is included at the full 
rate or not. 


From Eq. 5.4, 
in Eq. 5.3. 

c. Optimum 
fixed variance. 


y-'i xa uic ciuuiiionai 


Eu u 7 - ^ ^ included at 

e lull large sample rate but not in the subsample. 

Co is the additional cost per unit of the subsampled units actually 
included in the subsample. ^ 

Let us now determine the optimum allocation for a fixed error and 
E™^ ^’'Pected cost is given by the expected value 

EC = n (^C„ + QPi -t- 


where is the proportion of all units that have attribute Z; P^ 


1 - P,. 




260 OTHER SAMPLING METHODS Ch. 11 

If we specify that the standard error of estimate 5.1 be equal to e, the 
values of n and k which minimize the expected cost are 

„^%U+{k~ 1)^2 fs) (5-10 or 1-11.3.12) 




- P^Sl 


PiSl Cl + (CqIPi) 


where 


N8^ 


n — 


+ (e^lN) 


(5.11 or 1-11.3.13) 


(5.12 or 1-11.3.14) 


is the size of sample that would be required to achieve the specified 
accuracy with a simple random sample. 

This is seen by constructing the function 


F(n. k^X) ~ n 


Q + + 


+ A 


jY2 


k J 
N-n 


N 


Nn n 




Setting SFjdn and dFjSk equal to zero and eliminating A and n, the 
expression 5.11 for k follows. An expression for n in terms of k follows 
from setting 3F/3A = 0. This expression is easily shown to be Eq. 5.10, 
where the value of n, given in Eq. 5.12, satisfies the relationship 


Nn 


(5.13) 


Exercise 5.1. Assume that we have a population consisting of L strata with 
A/j first-stage units in the Ath stratum, and Aj, second-stage imits m e i 
first-stage unit. Assume that a simple random sample of m* first-stage units 
is selected from the M,. units in the hth stratum, and «« second-stage units are 
c«ler,ted from the hith first-stage unit in the sample. Assume further that 


Pin Pm = f i e„ a uniform sampling fraction is used in the /ith stratum, and 

Mn Nm 

that 

Pm. ^ ^ 

Nu Nn 


where 



Sec. 5 
Let 


DOUBLE SAMPLING 


261 


L rrih 

2 =« 

h i 

the total number of second-stage units in the sample, represent the initial sample 
of a double sampling design. Assume that of the n units in the sample are 
irom class 1, i.e., have some specified characteristic (such as being large farms 
or large stores or respondents to a mailed questionnaire) and are retained in 
the sample, and that the remaining n ~ rii ^ cases in the initial sample are 
in class 2, i.e., do not have the specified characteristic, and that we draw a simple 
random sample of 1 in A: from those remaining cases. 

Now, in the /ith stratum, let (dropping the subscript h) 


= 72 




J i j 


where ~ if the i/ih selection is a member of class 1 
= 0 otherwise. 


^ 2 ij — ^27j if th© ^'th selection is a member of class 2, 
= 0 otherwise. 


rii is the number of elements in the /th first-stage unit in the sample, 
i^ number of elements subsampled from the elements in class 
2, with 2 « 2 ^• = ■^ 2 - 

i 


A^ 2 ^ is the number of elements in class 2 in the/th first-stage unit. 

Show that 

al' = 2 ] + 2 ]) 

vvhere the first term of the right-hand expression is the expectation of the condi¬ 
tional va.riance of x' for a fixed set of m first-stage units and a fixed set of n 
observations in the ;th sample first-stage unit and is equal to * 


Ea 


2] 


where 


Mk~ 1 


/ 


m 


m 


^^2 

4 



w N-i, 

2 2 ^ 20 ' 

i j 


2A^: 


■2i 


IN, 


and the second term of the right-hand expression is the variance of the condi¬ 
tional expected value of x' for a fixed set of m first-stage units and a fixed set 
of Hi observations in the /th sample first-stage unit and is equal to 


2 ]) = 


M 




a 02 


Mm 


NmEh 





262 

where 


OTHER SAMPLING METHODS 


Ch. 11 



^6, Estimate and variance of Latin-square design (Vol. I, Ch. 11, Eq. 

4.2~4.8).* Suppose that a population consisting of M = L^M units is 
classified into L “columns” of LM units each, and that each column is 
classified into L “rows” of M units each. Let the parts into which the 
whole population is thus classified be called “cells,” so that each cell 
consists of M units. Select a cell at random from the first column. 
From the second column select at random any cell except that in the row 
selected from the first column. Continue ih this way, selecting at random 
in the rth column a cell from any of the L — r + 1 rows not selected for 
columns 1, 2, • • •, (r — 1). In each selected cell, choose at random one 
of the M units. There will then be m = L units in the sample. 

Let be the value of the ith unit selected from the cell in the aih 
row in column b, and consider the estimate of the total 

X' = (6.1 or 1-11.4.1) 

J a b i J 

where / = mjM is the sampling fraction, and x is the aggregate value of 
a characteristic for the units in the sample. 

To prove: x! is an unbiased estimate of 

L L M 
a b i 

and the variance of x' is 

^2, _ _1_ (32 _ g2 _ 32) + (6,2 or 1-11.4.3) 

^ m - 1 ^ m 


* This result is due to Jerome Cornfield and W. Duane Evans. 

* May be deferred. 


Sec. 6 
where 


LATIN-SQUARE DESIGN 


263 


L L 




a b 
L 


n = i-’ 2(A'„. - = z.%f 


ai = Xf = L^al 


with 


Si 

a h % ’ 


M 


ah 


2-^<i6>> ^ah — ^aJXf, X„. — 2^o 6> ^a- = X^.jL 


'»'■!> = X., = XJL, x=2^„. = 2^.| 


and 


;? = xjU' 

Proof, (a) Since \ If = LM, 

Ex' = LME21 I 2 hx,,, 


a b % a b i 

— £ ^ \ 1 L L M 


a h i 


(6.3) 


a b i 

L L M 

= 12 IXaU = ^ 

a b % 

Thus,is an unbiased estimate of X. 

{h) We now consider the variance of x'. We may write 

gI^ = + <^l]{x'\e) 

where cr^/|g denotes the conditional variance of x' for a fixed selection of 
cells and E{x'\c) denotes the conditional expectation of x' for a fixed 
selection of cells. 

We first note that 

E{x’\c) = LM22E2i^,,,\c) 

ah i 

__ L 1 \ M 

= L 22 Ea, 

a b 





264 


OTHER SAMPLING METHODS 


Ch. 11 


where denotes the population total for the cell selected in the ath 
column. 

To evaluate we write 

2 


E{E{x'\c) — Ex'Y ^ E — xj 

= L2J^(|ixJ 

\a b } 


(6.4) 


Now consider 

' L 1 


L \ L L \ L 

=-27 2^«. + 2 £7731^ 2 

a E i) -LAE / b¥^d 


b ^ d 


L \ L L \ 

= 272 ^L + 2 7^^^ 

a E b a^cE\^lu i; 

\ L L 1 

= 722 ^^.+ 7 ^ 1 ^ - 

E d b i-} a^c 




1 ^ 
—-— 2 
LiL- l)t 


2 

(F»)(F-)-F“^“] 

a^c \ b / 


1 L L 

7-4 2 2^a. 

E— 1 a b 


i2T(?-+H+S 


and Eq. 6.4 becomes 
2 _ 

<^E(x'\c) — T I 


L L 

227 ? 


(If--) 




( 6 . 6 ) 


where is the variance among the cell totals. 

a\ is the variance among the means per cell from the columns. 
a\ is the variance among the means per cell from the rows. 


Now consider 

E{{x' - E{x'\c)f\c) 


(6.7) 



Sec. 7 A GENERAL OPTIMUM 

For a sample of one unit per cell 


265 


.2 __ 

Vic 


where 


a b 


a 


1 M 
M i 


abi 


Xa,f 


and is the variance within the cel! selected in the ath column. Now 


a b 

u ^ b 

Combining these results into Eq. 6.3, we have 

1 


( 6 . 8 ) 




where 




1 L h 

Z2<‘ 


ab 


(6.9 or 1-11.4.7) 


a b 

is the average variance within cells. 

It therefore follows from the definitions of m, fij, a\ given above that 

1 


al, = 


— + ™ (6.2) 

It should be noted that this approach easily generalizes to other 
experimental designs. 

1. The optimum allocation of sample and the optimum weights for 
estimating a ratio from a stratified sample (VoL I, Ch. 11, Sec. 6). To 
prove: Consider a population divided into two strata and let 






where 


Vv 




(7.1) 






= 


1 i 


i 


are estimated aggregates for stratum I, and 


”2 i 


, No 

% = — IVi 

i 


* May be deferred. 




266 OTHER SAMPLING METHODS Ch. 11 

are estimated aggregates for stratum 11, and are unbiased estimates of 
Xi, Ti, X-i, and Y^, respectively; the constitute a simple random sample 
from the Ni elements of stratum I, the Wg elements constitute a simple 
random sample from the elements of stratum II, and the selections are 
made independently from the two strata. Also, and are weights to be 
applied to the stratum estimates, where + Wj = 1. Then 


where 


(m'i 


i) (7.2) 


and 


Ri = Y 

Sl = Rl(VU+Vlr-2yixY) 
Sl^RliVlx+ylr-^y^xr) 


X 

Y 


(7.3) 

(7.4) 


and Vlx, ^iy, and V^xr are the rel-variances and covariance in the first 
stratum of the X^, and Y^^, and the terms are similarly defined in the 
second stratum. Moreover, the values of w^, w^, n^, «2 which minimize 


estimate 7.1 subject to the condition that 
determined by 



T, 

-Sf 


— '^1^2 ^ 

y'(7?i-7?2)^- 

N^- 


r 



(s,->Si)^ + «[(Ai-R,)^--y 

^2- 


Wo = 1 and ni + n are 


(7.5 or 
1-11.6.3) 


SI-SA + 




R^f 


No 


Y, 


(«1 - ^ V yJ Ai 


The optimum value of Wg is obtained by subtracting in Eq. 7.5 from 
1 and of ua by subtracting the value of in Eq. 7.6 from n. 

Proof. For sufficiently large values of n-^ and 

E{r- Kf = E[wl{r^ - i^i)' + - R^f] 

+ — E) + 1 V 2(^2 7 ?) 1 ^ 




Sec. 7 
where 


A GENERAL OPTIMUM 


267 


and 


Vi 

By substituting the values for 

and 

E(r,~-R,f = 


N^-n 


^2 

2/2 

'^Sl 


Ao 


N 2^2 


SI 


where SI is given by Eq. 7.3 and <S| by Eq. 7.4,* and ^ ^ 1 

for R and 1 — for Wa in expression 7.7 above, we obtain 

EK, H-i) = .v? + (I )2 - 

\«1 JV,/ \„_„J at/ 


7 ') 


i?. 


+ 




R,r 


(7.8) 


If we set dF/d/tj^ = 0 and dFjdw^ = 0 and solve for n, and w,, we obtain 
Eq. 7.5 and Eq. 7.6. 

Exercises 

7.1. Find the optimum values of and ivg for fixed values of n^. The 

solution to this problem is appropriate when the sample allocation is predeter¬ 
mined. ^ 

7.2. Show that as (/?i - increases, and if the other terms are not sub¬ 
stantially affected, the optimum value of w-, approaches YJY 

13. Show that ^ 

iS'? iS'2 

E(R^- R^f == 


Ai A 


if it is assumed that the first stratum was made up by drawing a random sample 
of Aj establishments from some very large population for which the raho 
Ai/Fi = R, and in which the variance defined by Eq, 7.3 was equal to and 
if the second stratum was made up in an analogous manner from another 
population in which XJY^ = R and was the variance defined by Eq. 7 4 
7.4. Show that if ^ 


(R, - R,)2 = 


^<3 

ffi IV2 


(see Ex. 7.3) 


wi (opt.) 




(S, - 


* From Vol. II, Ch. 5, Sec. 3. 





Ch. 11 


268 OTHER SAMPLING METHODS 

Under these conditions, .and if the first stratum is made up of large establish¬ 
ments and the second stratum of small establishments, so that Y-^ — YifNi is 
larger than fg == L 2 /A 2 , show that (opt.) increases as increases relative 
to and, as a practical matter, for large enough relative to Y^ (and for 
the other terms constant), the optimum values are then 

wi I, VV 2 = 0 

rti = n, n 2 — 0 

See Vol. I, Ch. 11, Sec. 6, for a discussion of this case. 

8. Sampling on two occasions (Vol. I, Ch. 11, Sec. 7). Consider a 
population of N units in existence over a period of time, where N is large 
relative to the sizes of samples to be drawn or sampling with replacement 
is assumed.* Suppose that on the first of two occasions a simple random 
sample of n units is selected. Retain a simple random sample of Pn of 
these units for the second occasion, and supplement these by a simple 
random sample of Qn independently selected units, where P + Q == L 
Thus, the second sample is also of size n. Let 

x' =- mean per unit for the first period, for the Pn units that are common 
to the two samples. 

x" = mean per unit for the first period, for the Qn units that are in the 
first sample only. 

y' = mean per unit for the second period, for the Pn units that are 
common to the two samples. 

y" = mean per unit for the second period, for the Qn units that are in 
the second sample only. 

a. An estimate of the mean. We wish to estimate T, the mean for the 
second period, by a linear estimate ot the form 

y = ax -f bP + cy + dy" 

Since E£' ^ Ex' ^ X and Ef = Ef = Y, we find that 
Ey^{a-\-b)X-Y{c + d)Y 

If we now require that y be an unbiased estimate of f, we must have 
u + c p- d — \ 

so that 

y -- a{x - x') + cf + (1 - c)f (8.1) 

* N large relative to n (or sampling with replacement) is assumed for sim¬ 
plicity and as an approximation. The results can be extended to the case where 
njNh large relative to 1 and the sampling is without replacement. 







Sec. 8 SAMPLING ON TWO OCCASIONS 269 

The variance of y is 



^ _L. £! d: I ~ 2i 
n P n Q n 


'2iClC Y 

_ — ^ 


( 8 . 2 ) 


where a\ is the population variance of an individual observation in the 
first period, a\ is the population variance in the second period, and p is 
the correlation between the first and second periods for an observation 
on the same sampling unit. 

We wish to choose values of a and c that minimize (t|. Equating to 
zero the derivatives of with respect to a and c, it follows that the 
optimum values are 


P^Q Q'f 
1 - 


(8.3) 


P 

1 - 


(8.4) 


Thus, the estimate with optimum values for a and c may be written 
pPQ Gy 

{ X — X ) -h - -r— y - 

1-G^P 


Vn 


1 “ Q^P^ 
and its variance is 






1 - Q^P 
-p^Q 


n 1 - 


(8.6 or 1-11.7.3) 


Equating to zero the derivative of with respect to Q, we find that for 
a fixed sample size n the variance of will have its minimum value if 
we choose 


Q- 


1 - Vl - p2 


(8.7) 


Note that, if the estimate given by Eq. 8.5 is somewhat 

simplified, but its variance is unchanged. Note, also, that an estimate 
for the first occasion is given by Eq. 8.5, simply by interchanging #s and 
T’s if the estimate for the first occasion can await a time until data for 
both occasions are available. 

b. Estimates of the change. One possible obvious estimate of the 
change Y— Jf is 

A - P{y' - .r') + Q{f ~ E') (8.8) 

whose variance is 

pCTx^F) 


(8.9) 




270 OTHER SAMPLING METHODS Ch. 11 

If we consider the more general linear estimate of the change of the form 
ax' + bx' + cy' + df 

subject to the condition that this provide an unbiased estimate of Y- X, 
we find that we must take a + 6 = — 1, c + d = 1. Following the 
same procedure as for the estimate of the mean in the second period, we 
find that the estimate that minimizes the variance is 


1- ev ^ 1- 


+ 


PQp 




( 8 . 10 ) 


1 - Q^p^ 

In the special case that Cj, = the estimate is greatly simplified to the 
P ... (g.ll) 


form 


1 - Gp 


& 


’) _|- - — {y" — x") 

^ 1 - Gp 


The variance of A,„ (Eq. 8.11) is 

2 _ 2(1 - 
n(l - Gp) 


( 8 . 12 ) 


and its development is left as an exercise. 

Note that the estimate of change given by Eq. 8.10 is exactly the 
estimate that would be obtained if both X and Y were estimated from 
Eq. 8.5 and the difference — computed as the estimate of 

Y _ X. 

It is also to be noted that for p > 0 Eq. 8.12 is a minimum for 2 = 0, 
i.e., the variance will be minimized if the units on both occasions are 

identical and p is positive. _ 

c. Estimate of the sum of the means. By the same approach, we obtain 

as the optimum estimate of E + X the statistic 


2 


V) 


G(i - Gp^) 

1 - G“p^ 


{x" + y") + 


p 

1 - GV^ 


{x' + y') 


+ 


PQp 


1 - G^p" 




In the special case that the optimum estimate is again greatly 

simplified, so that it may be written 


z 


w 


P 

1 + Gp 


(x' + y) + 


G(1 + P) 
1 + Gp 


(x"+f) 


The variance of z,„ is readily obtained. 



Sec. 8 SAMPLING ON TWO OCCASIONS 271 

Note that 2 ;^ = + y^, where is given by Eq. 8.5, and is given 

by Eq. 8.5 with X's and T’s interchanged. 

d. Joint estimates of change and the means. It has been seen that, if 
the required timing of the survey estimates is such that the results from 
the samples for both occasions can be used in preparing estimates for 
each occasion, then the use of Eq. 8.5 to obtain estimates of the mean 
for each occasion also results in an estimate of the sum and of the 
difference that is the best linear estimate (i.e., smallest variance) that can 
be made from the data from the two samples. Often, however, estimates 
must be made for the first occasion before sample results from, the second 
occasion are available; the initial estimate must be made from the first 
sample only, and it may not be feasible to revise this initial estimate. 
Thus, suppose that we have estimated the mean on the first occasion as 


Px' + Qx' 

Suppose then that we wish to estimate both the mean on the second 
occasion and the change from the first to the second occasion in such a 
way that the ekimated change is the difference between the estimated 
means. Let us denote the estimated mean on the second occasion by y 
and the estimated change by A, and require that these are to have the 
forms: , „ 

y ^ ax bx + cf + dy' 

A = ex' +fx" + gf + hy" 

where the coefficients are constants. We have already required that 

A ~ y — X 

If we require further that Ey = f,y and A may be written 
^ ^ (e + P)x' - (c + P)x" + cy' + (1 - c)y" 

A = - (e + l)x" + cy' + (1 - c)y" 


It may be useful to determine the constants e and c so as to minimize a 
linear function of the variances of y and A. Without loss of genemlity, 
we may minimize 

where vr is a specified positive number. The solution to this problem is 
straightforward and yields 


P 

1 - 

p 

1 - 


( 

( 


w 

w + 1 




Qp Ox ^ 

W + I (T^ 



■) 







OTHER SAMPLING METHODS 


Ch. 11 


In the special case that a* = and tv = 1 (i.e., we wish to minimize 


a\ + al), the estimates are thus 


PQp(2- 


+ Qp^ jg' 

QY) ^ 


. P{2 2Qp Q^p^) 

2(1-ev) 2(1-ev^) 

^9. Sampling for a time series* (Vol. I, Ch. 11, Sec. 7). a. The 
sample model From a universe of size N, twelve independently selected 
samples are chosen at random, each of size n. One of the twelve is 
enumerated in the first month of each calendar year, a second m the 
second month of each calendar year, a third in the third month, and so 
forth, the twelfth being enumerated in the twelfth month of each calendar 

During the enumeration, each member of the sample reports both 
sales for the current month and also sales for the month previous to the 
current month. After each enumeration a simple unbiased estimate 
is made of the total sales for the current month, and from the same 
enumeration a simple unbiased estimate of total sales is also made for 
the previous month, u-\. Let a:, and respectively, represent these 
estimates as obtained on the wth enumeration. 

b, A composite estimate and its variance. 

(1) The estimate. An estimate, x'", of the total sales for the latest 
month, u, is given by 

x: = Kix'"_ I+ x,~ 2 /„_i) + K^u 1) 

where K K 2 = I with 0 < A < 1. 

The estimate may also be written as 

x'” = + (Xu - ^2/«-i) (9-2) 

In the special case where if = 1, an estimate analogous to a chained 
estimate results, while if = 0 makes the estimate the simple unbiased 

estimate. . . , . 

(2) Variance of the monthly total, x'f To obtain this variance, we 

shall first express the estimate for the latest month, w, in terms of all the 
simple unbiased estimates of totals which have been made. 

Since, in general, 

<' = if<li + (*i-%i^i) 

we can write, after multiplying both sides of this equation by if“ *, 

__ -V /r\ 


QV) 

P{2 + Qp) 




By Max A. Bershad, Bureau of the Census. 
May be deferred. 


Sec. 9 


SAMPLING FOR A TIME SERIES 273 

Substituting Zi = and summing both sides of 9.3 from the 

first month (i = ]) through the latest month (i = u), we have 

U 

2 = 2 4 - 2 

'^= I t = l 

= + 2 ( 9 , 4 ) 

Cancelling the similar terms on both sides of Eq. 9.4, 

K=-K'^x; + iK-~% ( 9 . 5 ) 

i --1 

But if we take, as the initial estimate (for the first month) in this time 
series, = Kx^ -4- 2 :^ -= x-^, Eq, 9.5 becomes 


X 


ftt 

u 


u 


K’‘~\ + 2 K«~% 

7-2 


(9.6) 


In the following it is assumed that the variances of the estimates x^ and 
are equal and are the same from month to month (i.e., al. = ^ 

Similarly, it will be assumed that the m.onthly correlations between a:, and 
are all equal to p. Remembering that all the 2 ’s are independent 
of one another except those for months which are a year apart (or 
multiples of a year) and taking 


as 


= 17^^^ + K^al , - 2Ka,, 
= (1 - IKp + K^)al 




as 


1 


“ (1 ~ for integral r 


when 

and 

Vi-l-nr I/,- 



q .2 ^2 

'-X 



we have from, Eq. 9.6, ignoring the yearly correlations with rrj. 


(9.7) 

(9.8) 

(9.9) 


= + al 2 + 22 

)-2 r-=] 


z,z-\ 2r 


u 

2 

:]2i' + 2 


1^14+ 12r—I 


(9.10) 


Substituting j u — /; we have 


74-2 


2 K‘^> + 22 


j = o 


12r 


f =--1 


- 2 - 12 ?' 


(9.11) 










274 OTHER SAMPLING METHODS Ch. 11 

Performing the indicated summations, and substituting Eq. 9.7 and 9.8 
in 9.11, we have 

\ _ 

cr|,, = + crf.(l - 2Kp + K^} ^ 

-j _ 12r) 

+ 2(1 - Kf41^ K^'^'pi2r - IZTT, - (9.12) 

When u is large and when terms involving the twelfth or higher powers of 
K can be ignored, Eq. 9.12 becomes 

9 . J\-2Kp + K^\ 


To find the value of K that makes a minimum, the derivative of crf.t 
with respect to K is taken and set equal to zero. It is found that 

1 — Vl — rQ . . X 


minimizes and, substituting this value in 9.13, we find that at its 


minimum 


= O'! Vl — 


Denote as the special case of x'^' which is obtained when ~ 1. 
Then from Eq. 9.2 

~ ^u-1 Vu-i (9.16) 

Since Eq. 9.13 does not apply when K is equal to 1, we must refer to 
Eq. 9.12. Substituting = 1 in Eq. 9.12 and remembering that the 

1 - 


we have 


lim-— — u — 1 

ir“ 1 - 

(r,A = ff![l + 2(l-p)(u- 1)] 


In the special case of = 0, = 1, <' becomes the simple unbiased 

estimate a;„ with variance tr^. 

(3) Variance of the month-to-month change, x" — x'"_y. From Eq. 9.6, 
we may write 

A'" = < - <'_i = l)zi + + 2 /f"''”') (9-18) 


K'‘-\K— l)a:i + «„ + 2 


(9.19) 



Sec. 9 SAMPLING FOR A TIME SERIES 

al. = ^ Kf + - 2Kp + K^) 

+ 2 (1 - 


275 


/I - A'\2 (11-2) 
l+( — ) 2 ^ 2 . 


M-12r-2 

/•=! ] 


- 2 


0--^f ,2 


K 


< 2 

r=] 


(9.20) 


Equation 9.20 ignores yearly correlations involving x^. Performing the 
indicated summations, we have 


2 


„2^2(«-2)(i - Kf + aid 


2Kp + K^) 


' 2 - - 2 )(i K) 

\+K _ 


. (1 - Kf 
Kd + K) 


<^1 2 [1 


^ j^2(M-12r-2) + ] 


] 


(9.21) 


When u is large and when terms involving the twelfth or higher order of 
can be ignored, Eq. 9.21 becomes 


a%, ~ 2al 


\~-lKp^ K^~ 

1 + A' _ 


Comparison with Eq. 9.13 shows that 


(9.22) 


4 .. - 2(1 - K)ala (9.23) 

In the special case where AT — 1, the variance of the difference 
Xy, — becomes on substitution in Eq. 9.21 


«-i' - 2o:|(l ^ p) 

(9.23a) 

In the special case where K == 0, the variance of the difference between 
two independent simple unbiased estimates becomes 

4 = 2al 


(4) Variance of the total of 12 months, x'" = x"'. 

i^li- 1 1 

Since by Eq. 9.6 

x: = K'<~% +’2V«„_. 

j = 0 

(9.6a) 

V /i-zc-i+n i-a:'2 »-2 

[~K^^ 

- I^U~ 12_ 

l~K ^ 


(9.24) 







276 

Then 


OTHER SAMPLING METHODS 


Ch. 11 


^2 ___ ^2 
— (7^ 




4- al fi—L?1 


\-K 


.V Tv 

2 2 0'..^-12r [Zj l_^j\ \- k )^ 


M-2~12r /I _ /rl2\2 ■ 

+ .2 


The above ignores yearly correlations with the initial value, x-^^. Per¬ 
forming the indicated summations and substituting Eq. 9.7, 9.8, and 9.9 
in 9.25, simplifying, and then dropping terms in K of the twelfth order or 
higher, assuming u to be large, Eq. 9.25 becomes 

In the special case of ^ = 1 and « = 12, there are no yearly correlations, 
and the variance of the annual total is derived by utilizing only the first 
term in the coefficient of and the term in Eq. 9.25. 

After evaluating the indeterminate forms, and, because x^ is one of the 
first 12 terms, summing the coefficient of cr^ only through 10, we have 

^^--crf2(7+l)^+12K 

7 = 0 




( 12 - 1 )[ 2 ( 12 )- 1 ] 


= + (l-p) 

In the special case where K = 0 


al = 2 ( 1 ) + < = 12 ^' 12 ^ 


(9.27) 


(9.28) 


(5) Variance of the month-to-month-a-year-ago change, 

From Eq. 9.5 or 9.6a, we may write 

< - <-,2 = 2 f - x,K-^\\ - K'-^) 

j=o 3 = 12 (^929) 





Sec. 9 SAMPLING FOR A TIME SERIES 

The variance of < - is equal'to 


277 


11 


/A32- 1\2 


I + 

J'o I A12 


22^^ 


\2 u~: 

j 2 

^ 12 




+ 1)2 


z,z—12r 

r = l Lf = 


[,l- (^) - 

u~2~12r / _ 1 \ 2 


f ]2r 


(9.30) 


Ignoring yearly correlations with the initial value performing the 
indicated summations, dropping terms of high order of A, assuming large 
w, and simplifying, we obtain 

+ (9.31) 


•<'» J'u-li 2 2 _ 


Substituting Eq. 9.7 and 9.8 in 9.31 yields 


[2(1 - IpK + A2) - 2pi3(l - Kf] 


(9.32) 

(9.33) 


J __ ^2 

For the special case where K 0, Eq. 9.32 becomes 

2(1 ~ Pi2)<yl 

c. An analogous composite estimate and its variance. Analogous to the 
composite estimate 

. . ~ ^u~ Vu-d (9.1) 

IS the estimate 


- Kx[ 


B'^u — _ "f 


Vn-l 


(9.34) 


The variance of ^^x'" can be written as 

+ al^ ) + 2((t^» - <r^„, j] 

+ Kllal} 

+ 2KK^[a^,,. ^^ ^ ^,5) 

The variance of nx'" can be written as 


I Ey„, ) 


+ VI + VI ) 


+ 2(F,..,„ - K 
+ (ExJ^KliVlj 
Ex'U(ExJ 


R’u, y,, 


,)] 


+ 




(9.36) 





OTHER SAMPLING METHODS 


Ch. 11 


Now, if Ex";, Ex„, and are approximately equal in level, and 

if both sides of Eq. 9.36 are divided by the square of this level, the 
resulting equation will be 9.35 with F’s in place of c s. 

For ready reference, the following table shows the equation numbers 
of the variances of estimates based on jx'll, which are given in this section, 
and the corresponding equation numbers of the analogous rel-variances 
based on which are given in Vol. 1. 


Estimate 


Equation number of the variance of 
the specified estimate 


In Vol. I, based 
on jsxZ 


In this volume, 
based on ^ 4 x 1 


Monthly level 7-23 

Annual level 7.25 9. 

Monthly change 7.27 y.^3<3 

Monthly level 7.30 9. 

Monthly change T33 

Annual level 7.35 y- o 

Month-a-year-ago change 7.36 _ 

Remark 1. It will be readily noticed from Eq. 9.1 or 9.34 that the com¬ 
posite estimate, x^y does not involve an unbiased estimate for the same 
month. The reason, of course, is that the observations necessary to make 
this estimate are obtained during the (w + l)st enumeration, at which tinrie 
observations for both and y,, are obtained simultaneously at very little 
additional cost for obtaining y^^ Consequently, as a general rule is 
obtained one month too late to be used in making the estimate Xu for the 

«th month. , . 1 , 

However, in many circumstances, it is possible to obtain the observations 

for Vy, in time to be used for the «th estimate. (If one were willing to pay 
for obtaining yu and separately rather than together, this could always 
be done.) When y^ can be obtained in time, an improved estimate for 
monthly level over Eq. 9.1 is 

< ^ - yu-i] + ~ -j (9.37) 

where as before, y^ as well as x^ is an unbiased estimate of the sales for the 


Mth month and K 


VI - p' 


The different variances of x'ZZ can be derived by methods similar to those 
used for <. The variance of the monthly level of sales, for example, is 
smaller than that of xl and will be found to be 


p 


(9.38) 


Sec. 9 


SAMPLING FOR A TIME SERIES 


(These latter results coincide with the results given by H. D. Patterson in 
‘Sampling on Successive Occasions with Partial Replacement of Units ” 

impleP®'' overlapping 

Remark 2 If the sample model is altered so that the sample for any 
rnonth is independent of that for any other month (i.e., the samples for 
the months of one year are not repeated in subsequent years), then it can 
be shown that the best linear unbiased estimate xZ for month u {u being 
large) of the form ^ 


Xu — QqXu + ciiXu-i + a^x^. 


~ hVu-i - 


is the composite estimate (9.1). ^ xVi 

of the form^’ unbiased estimate xZ' for month u {u being large) 


+ ^iXu-i + CI2XU-2 + • * • + Gu-iX^ 


is the composite estimate 9.37 


+ + b{yu^i + b2yu-< 


bu-xVi 


REFERENCES 

(1) W. G. Cochran, “Sampling Theory When the Sampling Units Are of 
Unequal Size,” /. Amer. Stat. Assn., 37 (1942), 199-212. 

(2) J. Neyman, “Contributions to the Theory of Sampling Human Populations ’ ’ 

J. Amer. Stat. Assn., 33 (1938), 101-116. r > 

(3) Chameli Bose, “Notes on the Sampling Error in the Method of Double- 
Sampling,” Sankhya, 6 (1943), 329-330. 

(4) M. H Hansen and W. N. Hurwitz, “The Problem of Non-Response in 
Sample Surveys,” J. Amer. Stat. Assn., 41 (1946), 517-529. 

(5) R. J. lessen, “Statistical Investigation of a Sample Survey for Obtaining 

Farm Facts,” Iowa Agr. Exp. Stat. Res. Bull. 304, 1942, ° 

(6) H. D. Patterson, “Sampling on Successive Occasions with Partial Replace¬ 
ment of Units,” J. Roy. Stat. Soc., Series B, 12 (1950), 241-255 

(7) W G. Madow and L. H. Madow, “On the Theory of Systematic Sampling 

I, Amiah Math. Slat., \S(\944), \-lA. f 

(8) W. G. Madow, “On the Theory of Systematic Sampling, If,” Annah Math 
5m/., 20(1949), 333-354. 

(9) F. Yates, “Systematic Sampling,” Phil. Trans. Roy. Soc., Series A, 241 

(1948), 345-377. ’ 

(10) W. G. Cochran, “Relative Accuracy of Systematic and Stratified Random 

Samples for a Certain Class of Populations,” Annals Math. Stat 17 
(1946), 164-177. ’’ 

(11) W. G. Cochran, Sampling Techniques, John Wiley & Sons, New York 
1953, Chapters 7, 8, and 12. 





CHAPTER 12 


Response Errors in Surveys* 

1. Role of nonsampling errors in determining survey design. As discussed 
\ in Ch. 2 of Vol. I, the nonsampling errors in a survey involving original 
\ collection of data may often be a more serious problem than the sampling 
errors. Many of the limitations placed on our choice of sample designs 
arise out of response error rather than sampling error considerations 
(more precisely, arise from the joint consideration of response and 
sampling errors^ As an example, one basic limitation imposed by the 
Census Bureau in designing its Current Population Survey (Case Study B 
in Ch. 12 of Vol. I) was that there be a full-time supervisor for each 
primary sampling unit. This limitation had a very substantial influence 
upon the cost equation for the CPS and, in consequence, upon the 
ultimate decision regarding the number of psu’s to be used. The decision 
to have a full-time supervisor for each psu was not based on sampling 
considerations but on the belief that close supervision of the interviewing 
process would reduce nonsampling errors. Fewer supervisors would, in 
fact, have allowed the use of more psu’s and a reduction in the sampling 
error. Implicitly, such a decision assumed that the reduction in response 
error achieved by increased supervision outweighed any increases in 
sampling error which might result. Actually, very few data are available 
i for determining whether, in fact, decisions made on the nonsampling 
features of survey design contribute to an improvement in the over-all 
accuracy and value of a survey. 

The paucity of dependable data on response errors is unquestionably 
the greatest present obstacle to sound survey design. In survey after 
\ survey, losses in sampling efficiency are taken on the basis of quite dubious 
^ assumptions about the magnitudes and distributions of response errors. 
Frequently, this point is obscured by the implicit (practically ‘ uncon¬ 
scious”) nature of survey designers’ assumptions regarding response error. 

For a demonstration of the relationship between response and sampling 
errors in survey. design, the student is referred to an article,f A Case 

* This chapter represents a minor revision of a paper by Morris H. Hansen, 
William N. Hurwitz, Eli S. Marks, and W. Parker Mauldin (9). 

t Marks, Mauldin, Nisselson (7). 


280 


Sec. 2 REQUIREMENTS ON MATHEMATICAL MODEL 281 

History in Survey Design: The Post-Enumeration Survey of the 1950 
Census.” This article outlines the numerous decisions which had to be 
made and indicates the mixture of opinion and habit which had to be 
relied on in making such important decisions as those on questionnaire 
design, interviewer selection, length and type of interview, and training 
and supervision of interviewers. 

Although work on the measurement of response errors is relatively new, 
several excellent analyses of sources and types of error are available. A 
summary of the main sources of response errors is included in Ch. 2 of 
VoL 1. For more extensive discussions of this topic the student is referred 
to papers by Deming (3), Marks and Mauldin (6), Marks, Mauldin, and 
Nisselson (7), and Ackolf and Pritzker (1). Mahalanobis (5) has devel¬ 
oped several important techniques for measuring and controlling response 
errors, particularly those arising from the interviewer. 

Most of this chapter is devoted to the explicit formulation of a mathe¬ 
matical model for “response errors.”* An essential preliminary to such 
a formulation is a determination of some of the important requirements 
that a mathematical model should meet in order to make it conform 
reasonably well to actual survey conditions. One important feature of all 
survey designs is the estimating procedure. The processes of sampling, 
data collection, coding, and tabulating introduce “errors” into survey 
results. These errors may be affected by the choice of an estimating 
procedure. The present chapter does not involve consideration of the 
relationship between response errors and choice of estimating procedures. 

2. Some requirements on a mathematical model for response errors. In 

defining “error” we start with an “estimate” and a “value estimated.” 
The “estimate” is some value determined from the survey data and, for 
any particular survey, is a definite number but varies from survey to 
survey. In many surveys, the “value estimated” is not defined explicitly, 
and the problem of survey design is complicated by vagueness regai*ding 
what is being measured. However, if the aim is orderly planning of a 
survey rather than catch-as-catch-can methods, it is essential that the 
“value estimated” be defined precisely. 

Estimating an average or aggregate, A common type of “value 
estimated” in social surveys is one which is an average or an aggregate of 
the values for the individual elements that make up the population. 
Each element of the population has attached to it some value of a variable, 

* In the broadest sense, “response error” as used here includes both processing 
and data collection errors. The present chapter is oriented primarily towards 
data collection errors, but most of the discussion is directly applicable to the 
control of processing errors. 



282 


RESPONSE ERRORS IN SURVEYS 


Ch. 12 


and we want to know the average or the aggregate for some or all these 
Z' values. For the present, we shall consider only the case in which we are 
‘ estimating an average or aggregate for all the population elements, or 
an average or aggregate for a subgroup of the population, where the 
members of the subgroup are identified as such without error. 

In making a sample survey to estimate a population aggregate or 
average, we observe the values of some of the population elements and 
derive the estimate from these observed values. The fact that we have 
selected for observation some but not all the elements ordinarily introduces 
some error (sampling error). In addition, we frequently find that there 
are response errors in the individual observations. Thus, even if we were 
to observe all fhe elements of the population (i.e., take a census), we would 
usually have an error in our estimate of the population average or 
aggregate. 

It should be noted that errors of nonresponse play a peculiar role. 
Failure to secure a response can be considered a sampling bias on the 
assumption that the “nonresponse” elements have a zero chance of 
inclusion in a sample. Failure to secure a response can also be considered 
a response error, since any estimating procedure involves assigning values 
to the nonresponse elements either implicitly or explicitly; e.g., estimating 
the population average on the basis of the respondents alone is equivalent 
to assigning the average of nonresponses a value equal to the estimated 
average. 

The concept of '"individual true valuesT In defining the “value to be 
estimated” we shall define a true value for each of the individuals who 
make up the population and define the value to be estimated as an average 
or aggregate of these individual true values. The individual true value 
will be conceived of as a characteristic of the individual quite independent 
of the survey conditions which affect the individual response. Thus, age 
is usually defined as a time interval between two events, and this definition 
is quite independent of how we determine an individual’s age. It should 
be remembered, however, that the number you get when you ask a person 
his age is not necessarily the true value for the age as defined. The 
respondent may not know his “true” age. Sometimes he does not know 
exactly the age of his wife or others for whom he may report. Even if 
he does know the correct answer, he may misunderstand the question or 
become confused in “recall,” or he may purposely give an incorrect 
answer. 

Difficulties of ascertaining individual true values. For some variables 
(e.g., age or sex) a survey may get the true values for a large proportion of 
the individuals. For other variables (e.g., income, brand preference, or 
purchases) the true values may be obtained for a much smaller proportion 


Sec. 2 REQUIREMENTS ON MATHEMATICAL MODEL 283 

of the population. A survey rarely gets the true values for all the indi¬ 
viduals regardless of the characteristic measured. Frequently, by a 
sufficient expenditure of well-directed eltort we can come much nearer to 
the true value. For example, in many countries the determination of age 
could involve an examination of birth or baptismal certifications, or of 
primary school records if no birth certificate exists, or of the first census 
in which the individual was listed if neither birth certificate nor primary 
school record exists. Exhaustive record searches might give the true age 
for most individuals, although there would obviously be persons for 
whom we could find no records and other individuals whose records were 
in error. The searches would, of course, be relatively expensive compared 
with methods ordinarily used for determining age. 

Criteria for a definition of true value. There are many cases in which 
we might encounter tremendous difficulty in defining a “true” value 
(entirely apart from the problem of determining the value once we have 
defined it). What, for example, is a person’s “true intelligence,’' “true 
attitude toward revision of the Taft-Hartley Act,” or “true brand prefer¬ 
ence for cigarettes” ? No definitive answer can be given to these questions. 
We would suggest, however, three criteria for the definition of “true 
value”: 

(1) The true value must be uniquely defined. 

(2) The true value must be defined in such manner that the purposes 
of the survey are met. 

(3) Where it is possible to do so consistently with the first two criteria, 
the true value should be defined in terms of operations which can 
actually be carried through (even though it might be difficult or 
expensive to perform the operations). 

It is possible to define true value in such manner that a survey is subject 
to no (or negligible) response error. It will be useful to consider such 
definitions of “true value” in the light of the criteria listed above. For 
example, we could define a person’s “attitude toward revision pf the 
Taft-Hartley Act” as the alternative (in a set of six alternatives) that he 
first selects after an accredited interviewer for the survey has asked him: 
“What^ do you think of the Taft-Hartley Act?” We could define a 
person’s birthplace as the answer recorded for him by an interviewer who 
is instructed to ask; “In what state or foreign country were you born?” 
These definitions meet (or, with a little expansion, can be made to meet) 
two of the three criteria: i.e., they are unique and are defined in terms of 
operations which can be carried through. In most cases, however, they 
will not be acceptable as “true values.” There might, perhaps, be survey 
directors who would accept these definitions as the things they really 




284 RESPONSE ERRORS IN SURVEYS Ch. 12 

want to measure, but most consumers of data are after something less 
dependent on the particular interview conditions (even though results of 
this type may be quite acceptable as approximations to the true value). 
We may want to know how a person is likely to act toward a Congressman 
who favored or opposed the Taft-Hartley Act, not what his casual reply 
is to a rather vague question asked by a person whose motives and 
sponsorship may generate a very complex reaction in the respondent. 
We may want to know where a person was actually born, not what gets 
recorded as his birthplace when the interviewer fails to ask the question 
properly, or the respondent misunderstands the question, or the inter¬ 
viewer misinterprets the answer. 

Use of an expected response value to approximate the true value. In the 
examples cited (and in many other cases) it may be impossible to define 
a true value which meets all of the three criteria listed. Often, however, 
we can define a value which meets the first two criteria and can at least 
define an operation whose “expected value” will give a satisfactory 
approximation to the true value. An example is a study done by the 
Bureau of the Census. After the 1950 Census of Population was com¬ 
pleted by the large number of personnel hired as enumerators, carefully 
selected and highly trained interviewers recanvassed a sample of areas, 
taking with them a record of the original enumeration, looking carefully 
for persons missed in the original enumeration, and checking a sample of 
those persons who were enumerated in the area to make sure that they 
should have been enumerated. The individuals who did the recanvass 
were (in general) well-trained, conscientious, and thoroughly familiar 
with the rules that prescribe which persons are to be enumerated in a 
given enumeration district. The recanvass procedure did not, of course, 
insure a “perfect” measurement for each individual, but it came nearer 
to doing so than the procedure used originally. 

Consider interviewing each individual a large number of times under 
exactly the same conditions as the recanvass. This would yield a popula¬ 
tion of responses for all individuals. We might draw a sample of 
individuals and then a sample of one of the possible responses from each 
of the individuals in the sainple. In practice, the conditions for subsequent 
interviews might change because of the conditioning effect of earlier 
interviews, but we can conceive of a set of independent recanvass inter¬ 
views of a respondent and can regard the particular interview made on 
the recanvass as a sample from this set. The expected value of an 
estimate from this sample could be regarded as approximating the true 
value. For a reasonably large set of such observations, the estimates 
made from the recanvass would then be close to the “true population 
count.” 




Sec. 2 REQUIREMENTS ON MATHEMATICAL MODEL 285 

The concept of an individual response error. The term individual 
response error will be used here to denote the difference between an 
individual observation and the true value for the individual. For 
example, the survey might want age as of last birthday as a difference in 
whole years between date of birth and some specified date (say April 1, 
1950). If one of the persons covered by the survey was born April 1,' 
1897, but in 1950 is reported as 50 years old, the “individual response 
error” would be 3 years. 

A less obvious case of response error is the failure to report an 
individual in a census of population (or in a sample survey tised to 
estimate total population). Here the “true” value (the value the census 
is trying to obtain) is 1 (1 person), the value obtained for this individual 
is 0, and the response error is 1. Since the direction of error may be 
important, it would be better to call this an error of —1. Similarly, 
counting the same individual twice would be an error of +1. 

Variance and bias of response errors. As here defined, an “individual 
response is the value obtained on a particular observation (e.g., the 
result obtained in a specified measurement or interview by a specified 
interviewer with a specified respondent at a given time). Under slightly 
different conditions, therefore, the value of the individual response might 
be different. Thus, the individual response is influenced by the conditions 
of the observation or interview or written response. 

The variability of individual responses has often been treated in terms 
of random variation. Although this approach has certain defects, we 
shall adopt it for purposes of the present analysis. Consequently, the 
response error of a particular individual in a given survey will be thought r */ 
of as having an expected value (the individual response bias) and a ' 
random component of variation around that expected value. Siiriilarly, 
the aggregate or average of a set of responses for different individuals will 
have a response bias and a response variance which will be determined 
by the response biases and variances for the population of individuals. 

Essential conditions of a survey. To say that an individual response is 
a random variable is not, however, sufficient—we must define somewhat 
more precisely the universe of individual responses involved. For this 
purpose we shall consider all responses obtainable under certain 
“essential” conditions. In general, these conditions are “specified” 
(either implicitly or explicitly) by the survey design. As a mininium a 
survey design must specify the subject of inquiry, the method of obtaining 
information (interview, mail inquiry, direct observation, etc.), and the 
method of recording the information (checking a box, entering a figure, 
writing a description of the response, etc.). These specifications may be 
general or specific. 




Ch. 12 


286 RESPONSE ERRORS IN SURVEYS 

Particular surveys may involve additional specifications, e.g., that the 
survey be taken during a particular^ period. There are also essential 
conditions of a survey which arise implicitly as necessary consequences of 
the explicitly specified conditions. For example, if we specify that a 
survey of individual income received during 1949 be taken during April 
1950, there is implicit in that specification a certain recall situation for 
each respondent and a relationship of this recall situation to income- 
tax-filing activities. If we also specify that responses be obtained by 
interview, the fact that the survey is to be done in April 1950 implicitly 
specifies a certain condition of the labor market and this may impose 
restrictions on the type of interviewer obtainable. The compensation 
paid and training given to interviewers, the wording of questions to be 
asked, and the sponsorship of the survey are frequently a part of the 
specified survey conditions, and these specifications determine, in turn, 
other conditions which will distinguish this response situation from other 
response situations. 

On the other hand, there are usually present, at the time of any response, 
conditions which may affect that response but which are neither specified 
survey conditions nor the direct consequences of specified survey condi¬ 
tions. If the survey design specifies the types of interviewers, the sponsor¬ 
ship of the study, the compensation offered, and the hiring procedures 
used, these specifications may make it certain that John Jones will be 
interviewed by one of a certain class of individuals (e.g., persons over 30 
years of age who have had at least 2 years of high school education and 
some experience as interviewers for other surveys), but the exact identity 
of the interviewer may still vary within the limits of the specified class. 
The survey design may instruct the interviewer to ask certain questions, 
but it cannot insure that the questions will always be asked in exactly the 
same way. The survey design may specify a certain approach to respond¬ 
ents, but it will not specify how that approach will be received by a 
respondent who happens to be interrupted while she is doing the family 
laundry. 

In general, the survey specifications (explicit or implicit) restrict the 
range of response variation but by no means eliminate variation com¬ 
pletely. Under some conditions the range of variation may be narrow; 
under others it may be wide. Similarly, the response errors may be 
compensating in character or they may be more or less systematic in 
direction, thus creating a response bias. The expected value of the 
response errors and the random component of variation around that 
expected value may be regarded as determined by the essential survey 
conditions. 

In practice, some of the essential conditions of a survey will be difficult 



Sec. 2 REQUIREMENTS ON MATHEMATICAL MODEL 287 

to separate from the unessential ones, but the fact that some are essential 
and. others are of an accidental character needs to be recognized. 
Basically, the “essential conditions’’ of a survey are those variables which 
we are consciously trying to keep uniform over all cases covered, The 
“uniformity” may be in a rule rather than being absolute (e.g., in a study 
of sex behavior, we might require that female interviewers interview 
female respondents and male interviewers interview males), but the 
important point is that we deliberately attempt to bring these conditions 
“under control” (or are forced by circumstances to accept a uniformity 
in certain conditions). Often the problem of improving survey design 
will be to identify and deal with some of the more important essential 
conditions. 

In contrast to the essential conditions of the survey, the “random 
errors” are controlled not by the introduction of uniform rules and 
procedures but by taking several units—several clusters, several elements 
for the variables discussed in earlier chapters, several interviewers for the 
“interviewer variance” discussed in this chapter, or several coders or 
punchers when we are dealing with “coding or punching variance.” The 
use of “scores” based on several questions in attitude surveys and other 
psychological measurements is another example of the control of random 
variation by increasing the number of units (in this case the number of 
questions). 

Correlation of response errors when interviewers are used. It would be 
convenient to assume that in any particular survey the random component 
of the response error for one individual is uncorrelated with the random 
component of the response error for another individual. Unfortunately, 
such an assumption does not accord with known facts about response 
variation. In particular a mathematical model which postulates inde¬ 
pendent responses of all individuals will not fit a survey which uses inter¬ 
viewers unless the interviewer is assumed to have no influence on the 
response. If we were to assign at random a different interviewer to each 
individual, the effect of the interviewer on the response would be 
uncorrelated for any two obtained responses. Ordinarily, however, a 
given interviewer obtains and records the responses for a number of 
individuals, and often we have reason to believe that the errors made by 
a particular interviewer are correlated. Even casual observation of an 
interviewer at work reveals the presence of interviewing patterns distinctive 
to that interviewer. In an inquiry about labor force status an interviewer, 
who implies by his manner that he does not expect to find a housewife 
with children gainfully employed, may tend to record fewer employed 
women than an interviewer who seems to insist that every adult should 
be gainfully employed. 







288 RESPONSE ERRORS IN SURVEYS Ch. 12 

The present analysis uses a mathematical model which assumes that 
responses are uncorrelated if they are obtained for different individuals by 
different interviewers. However, there may be correlation between 
responses even when both the individual and the interviewer are different. 
For example, the presence of a common supervisor or participation in the 
same training class may result in correlated errors for two different 
interviewers (unless these common influences are specified as essential 
conditions). Correlation between responses obtained by different inter¬ 
viewers may also be introduced in processing; e.g., the same clerk may 
make similar errors in coding both responses. We shall assume that these 
correlations are small and can be neglected, although the model could be 
extended to include them. 

Specification of a mathematical model.^ The discussion thus far 
presented leads to a mathematical model for the analysis of response 
errors in which we have: 

{a) A population of N individuals and a population of M interviewers, 
both of which will, for convenience, be assumed to be large. 

{b) Associated with each individual, a true value. 

(c) A set of essential survey conditions which determine for a particular 
individual and interviewer the expected value of a random variable. 
{d) Zero correlation between the random components of responses for 
two different individuals with two different interviewers. 

{e) The order of interviewing respondents by an interviewer either 
randomly determined or not affecting the responses. 

In many surveys interviewers are available to interview only certain 
classes of the population and only in certain geographic areas. We shall, 
therefore, conceive of our interviewers as divided into L groups with Mj, 
interviewers in the hth group who are available to interview a particular 
Nj, individuals and no others. Where all interviewers are available to 
interview all individuals, L= I, Mj, M, = N. 


3. The effect of interviewers on the variance of sample estimates. Effect 
of response errors on estimates of sample variance. One major advantage 
possessed by probability sampling as compared with the other types 
of sampling is the possibility of estimating the sampling error from 
the sample. In situations where the sample results are uniquely deter¬ 
mined by the act of selecting the sample individuals (i.e., given the fact 
that the ith individual is in the sample, there is one and only one 


* See also Sec. la of this chapter. 



Sec. 3 EFFECT OF INTERVIEWERS OH variance 289 

value to be ascribed to the ith individual) there is, of course, no 
question of our ability to estimate sampling error from a reasonably 
large sample. 

When the individual responses are subject to error, we shall see that, 
with appropriate methods, the sampling variance of a statistic such as a 
mean or total will reflect the response variation as well as the error due to 
including only a sample of individuals. Appropriate analysis of the 
sources of error will point to the methods for minimizing the total variance. 
However, although the use of probability sampling will insure that the 
variance of the individual true values will be appropriately reflected in the 
variance of a sample estimate, the accurate reflection of response variance 
will depend on the applicability of whatever mathematical model is 
assumed. 

The response bias of a statistic such as an estimated mean or total will 
not be reflected in the variance of a sample statistic, although its effect, 
if it can be estimated, will be reflected in the mean square error and its 
influence on accuracy thus taken into account. Response bias is not 
per se a “sampling” problem, i.e., bias arising from response errors is 
ordinarily independent of the sample design and is, in fact, of the same 
magnitude for a study involving a complete canvass of the population as 
it is for a sample survey if both the complete canvass and the sample 
survey are taken under the same essential conditions. Postponing to a 
later section the consideration of response bias, we shall examine first the 
other component of survey error, i.e., the variance of a sample estimate, 
and shall examine particularly the contribution of the interviewer to this 
variance. 

The design of a survey to evaluate response variance due to interviewers. 
In evaluating sampling variance we must consider the particular technique 
of drawing a sample and making an estimate from this sample. In studies 
which involve the use of interviewers we must consider also some specified 
technique for selecting the interviewers and assigning them to the various 
individuals included in the sample. 

Actually, survey practice in the making of interviewer assignments is 
far from standard. A common pattern is to group the units selected in 
the sample by geographic areas and then to assign the units in a given area 
to one or more interviewers, making the different interviewers’ assign¬ 
ments approximately equal. The sampling units may be individuals or 
clusters, but in either event, in surveys in which interviewers are used, 
costs of travel and time required for identification of the sample usually 
suggest some clustering of the assignments to interviewers. This clustering 
of interviewer assignments led to the introduction of interviewer groups 
into the population specifications outlined above. In terms of the specified 



290 RESPONSE ERRORS IN SURVEYS Ch. 12 

mathematical model, this practice can be approximated by the following 
sample design: 

{a) n of the N individuals in the population are selected at random 
without restriction.* 

{b) rrif^ interviewers are selected at random without restriction from the 
hth interviewer group to interview those sample individuals selected 
who are available for interview by this interviewer group. Let 

L 

m = ^ the total number of interviewers selected. 

h 

(c) An equal number, of individuals is assigned to each of the m 
interviewers. The n individuals assigned to any interviewer are a 
random subsample of all the sample individuals available for 
interview by this interviewer group. 

The applicability of these conditions to actual surveys will be considered 
later. The conditions stated apply reasonably well to many surveys. 

It should be noted that n^, the number of sample cases drawn which 
will be available for interview only by interviewers in the hXh group, is a 
random variable. In designing a survey we could decide to use a fixed 
number of interviewers from the hXh group and adjust the size of assign¬ 
ment given each interviewer. For example, if we were using 2 interviewers 
for a given group and happened to draw 84 sample cases available to this 
group, we could give each interviewer 42 cases; if we drew 76 individuals, 
each interviewer would be assigned 38, etc. Another method of deter¬ 
mining interviewer assignments is the one used here, i.e., to fix the size 
of the assignments and let the number of interviewers vary. The restric¬ 
tion that the size of the interviewer assignment be fixed does not represent 
any great loss of generality, since the variance of most sample estimates 
will be about the same whether the size of assignment or the number of 
interviewers in a group is fixed. 

The sample estimate and its mean square error. Assume that a simple 
random sample of n units is selected from a population of N units. Let 
= the value obtained for theyth sample unit by the /th sample inter¬ 
viewer in the hi\\ (population) group. Let the sample mean be 

L Wh n 

X = - (3.1) 

n 

* We shall restrict this discussion to simple random sampling. The results 
can be extended to stratified and cluster sampling. 

■f See Sec. lb for derivation of the formulas presented in this section. 



Sec. 3 EFFECT OF INTERVIEWERS ON VARIANCE 291 

With the survey design specified, ^ would be used as an estimate of the 

true population mean f. The mean square error of ^ is: 

MSEx = (3.2) 

where Ex — Y. 


Ex = the expected value of x. 

Section lb shows the derivation of an expression for cr| as approx¬ 
imately* equal to; 

'^x 




~ ^xi 
n m 


= f£+” 

n 


m 


(3.3) 


Here o-j. represents the “total variance” of individual responses around 
the mean of all individual responses in the population; i.e., it is the 
variance over all responses for all individuals to all interviewers Jn the 
interviewer group, and over all interviewer groups, and is the co- 
variance between responses obtained from different individuals by the 
same interviewer (this covariance being taken within interviewer groups 
since independent selections of interviewers are made from each inter¬ 
viewer group). If we divide the covariance by the average variance 
of responses within interviewer groups, we have 6, the intraclass 
correlation; i.e., <5 is the correlation between responses of different 
individuals for the same interviewer. Thus: 


4 


'wX 


+ C 


bX 


'XI 


*wX 


(3.4) 

(3.5) 


where is the variance of responses within interviewer groups (taken 
over all responses of every individual to every interviewer 
in the group). 

(^Ix is the variance of expected responses for interviewer groups, 
i.e., between average values for interviewer groups. 

The,sampling design used is, in effect, to sample “clusters” of responses 
(the responses obtained by each of the interviewers) and, within sample 
clusters, to subsample n responses (such that no two responses are for the 


* Assuming that N is large relative to n and that the interviewers used in the 
viewers^^ ^ random sample from a potential infinite supply of such inter- 




292 


RESPONSE ERRORS IN SURVEYS 


Ch. 12 


same individual). The similarity to cluster sampling may be more 
apparent if we express <t| as 

(3.6) 


2 


ft 


In the above expressions, alxin represents the variance arising because 
individuals were sampled independently of the interviewer groups. If we 
had only one interviewer group (L = 1), (yfx — ^ so that 


4 






(3.7) 


and if we sampled individuals within interviewer groups, so that the 
interviewer groups served as strata, al would be given by Eq. 3.7, but 
with alx substituted for (7^. 

This formula is identical with that for the variance of a sample mean 
when we draw m clusters of n elements each (or sample m clusters of equal 
size and subsample n elements from each cluster). There are, of course, 
differences from straight cluster sampling arising from the restriction that 
we must sample only one response for any individual, but the basic 
sampling principles are analogous. 

Estimates of variance from the sampled From the sample we can 
obtain unbiased estimates of Cxi and 0 %. These estimates are, 
respectively. 




^xi 




Ytlh ^ ^ f, 

7 2 1 2 

1 i h i 3 _ 


4 


n(h — 1 ) 

I f („ _ m) 

h % 3 __ _L 2-- 

n-\ {n-\) m 


m 

L rrih n 


(3.8) 


(3.9) 


where 


ft 




- ± 


average for the (th sample interviewer in the ftth group 

(3.10) 


rrih n 

2 

7 — i—2- 


X 


mpl 


mn 

i 


sample average for the hth group (3.11) 


* For derivations of the formulas in this and the following sections see 
Sec. 7c. 




293 


Sec. 3 EFFECT OF INTERVIEWERS ON VARIANCE 


Thus, an unbiased estimate of cr| is 




L nih n 

2 2 2(*;,ii — 


h i i _ 

n(n-~ 1) 




(n~ 1) m 


(3.12) 


Contribution of interviewers to the variance. It should be noted that the 
effect of using interviewers is to introduce into the variance of x a term 
involvmg the intraclass correlation within interviewers’ assignments. Except 
as indicated below, the techniques given in Vol. 1 and in earlier chapters 
of Vol. II for estimating the variance ofx (and of other sample estimates) 
from a sample disregard the contribution of this intraclass correlation. 
If we disregard the intraclass correlation within interviewers’ assignments, 
the estimate of the variance of x would be the first term of Eq. 3,12. It 
can be seen that the result will usually be an underestimate in cases where 
there is a noticeable interviewer contribution to the total variance. 

If w = /?, each interviewer interviews only one unit, and there is, of 
course, no effect of the intraclass correlation. In this case the formulas 
for the variance given in earlier chapters include any interviewer contri¬ 
bution to the variance.* On the other hand, the larger the interviewer 
error, or the more individuals we assign to an interviewer, the more 
important it is to use Eq. 3.12 to estimate the variance of x. 

Equation 3.12 is useful in indicating the effect of interviewer error upon 
the variance of a sample mean. Where there is no need for a separate 
estimate of Eq. 3.12 can be written as 


^2 _ 




and, if L = 


m(n— 1) 


+ 


2(^i — 

J, _ 

m{m~- 1) 


h _ 

n{n— 1) 


(3.13) 


(3.14) 


A study director would not ordinarily assign a single individual to an 
interviewer, but, with duster samples, a single cluster might be assigned to one 
or more interviewers. Where an interviewer or group of interviewer! is assigned 
to a single cluster, the estimated variance between ultimate clusters, by the 
usual analysis, will include interviewer variance. Thus, with large-cluster 
sampling, discussed in Chapter 9, it may often be true that one or more inter¬ 
viewers will work in only a single primary sampling unit. In this event, the 
ultimate cluster estimate of the total variance will include the appropriate 
mterviewer contribution to the total variance (see, for example. Sec) 4 and 5 
Ch. 9, and Sec. 15 and 28, Ch. 9, Vol. I). Where more than one cluster is 
assigned to an interviewer, we can estimate the interviewer variance separately 
by the methods given here. ^ •' 



Ch. 12 


294 RESPONSE ERRORS IN SURVEYS 

The covariance, Oxb reflects the effect on response of the interviewer 
and the interaction between interviewer and respondent. If the observa¬ 
tion is of a type which permits a large effect of the interviewer on the 
response, the contribution of the term involving variance of x 

may be quite substantial. In many cases, however, Oxi will be negligible. 
We might, for example, expect a large variance among interviewers for 
estimates of farm acreage under corn, where the interviewer does the 
estimating by direct observation without measurement. Where the 
farmer furnishes information to the interviewer on the number of cattle 
on a farm, there may be little or no effect of the interviewer on the response 
and aXI may be negligible or zero. 

Reducing interviewer contribution to variance. Where interviewer 
contribution to the variance is important, it may be possible to reduce this 
contribution significantly by training and adequately supervising the 
interviewers, and this should be the first line of attack. Sometimes, 
however, training of the interviewers beyond a certain point will have 
very little effect on interviewer variances. Instead of trying to make 
additional reductions in interviewer variance by increased training and 
supervision of the interviewer or using other (and perhaps expensive) 
techniques to obtain greater interviewer uniformity, we might devote our 
attention to another method for reducing the effect of interviewer variance 
on our final estimate. From Eq. 3.3 it will be seen that, for fixed values 
of 0 x 1 , the effect of interviewer variance on a\ decreases as we increase 
the number of interviewers. Thus, if cost were not a factor, maximum 
accuracy with this sample design would be obtained by assigning one 
individual to each interviewer.* 

Determining the optimum number oj interviewers. With the ordinary 
survey which has a fixed total budget, increasing the number of inter¬ 
viewers will increase costs and will require a reduction of expenditure at 
some other point, e.g., reducing the expenditure per interviewer or per 
individual or reducing the number of individuals included in the sample. 
When the cost function is simple, as in Eq. 3.15, optimum values of n 
and m can be readily determined by joint solution of the cost and variance 
functions. With more complicated cost functions, the optimum values 
can be determined in the same way as for subsampling designs (see 
Chapters 6, 7, and 9). 


* This statement is subject to the limitation that response bias and variance 
between interviewers remain fixed. Ordinarily, it will not be possible to make 
extreme changes in size of interviewer assignment without changing the response 
bias and interviewer variance, but the analysis is acceptable within reasonable 
limits of variation. 


295 


Sec. 3 EFFECT OF INTERVIEWERS ON VARIANCE 
We shall consider the case in which the cost is given by 

C=^C^n-\~C^m (3 15 ) 

where C = total budget for the survey, excluding fixed overhead costs. 
Cl = cost per elementary unit included in the sample. 

Cg == cost per interviewer used in the survey. 

With this cost function and the variance given by Eq. 3.3, the optimum 
values of n and w are ^ r 




where 

__C_ 

^xi) + (3.18) 

Some illustrations. Assuming that cost functions are known or can be 
roughly approximated, application of the technique may be illustrated by 
data from two studies where interviewer assignments were randomized 
However, for a satisfactory estimate of interviewer variance, we will need 
more interviewers than the numbers used in the studies mentioned here 
The interviewer variance estimates of Tables 1 and 2 (pp. 297 and 298) 
are based on a very small number of eases and are, therefore, quite 
unreliable. They are presented only for purposes of illustration. 

The Indian Statistical Institute has pioneered in the design of surveys 
so as to make possible the evaluation of response variation associated 
with the interviewer. Methods similar to the survey design described in 
this section have been used for some time by the Indian Statistical 
Institute to control and measure the effects of the “human agency ” 
Some of these techniques are described by Mahalanobis (5). One such 
design was used in an inquiry to determine the economic conditions of 
factory workers in an industrial area at Jagaddal. The entire area was 
divided into .5 subareas. Within each subarea 5 independent random 
samples of structures were selected for interview. Each of the 5 samples 
was assigned to a different interviewer, but the same 5 interviewers 
worked in all 5 subareas. 

This design is similar to the one described above. In this case all 
interviewee are (presumably) available to interview the entire population, 
so that L-X. There is a stratification within interviewers’ assignments 




Ch. 12 


296 RESPONSE ERRORS IN SURVEYS 

(the sampling by subareas). Results are presented on a “family” basis, 
although the sampling unit used was actually a structure and thus might 
involve a cluster of families. To simplify the use of Mahalanobis’ data 
for illustrative purposes, we shall ignore the stratification and clustering 
and treat the sample of families as if it were an unrestricted random 
sample of the population surveyed, the families being the individual 
members of this population. 

Mahalanobis did 3 studies in the Jagaddal area (in 1941, 1942, and 
1945) all involving approximately the same design. He also reports a 
study’using a similar design (5 subareas but only 4 interviewers) carried 
out in the Nagpur in 1942-43 by M. P. Shrivastava. Table 1 shows 
estimates of ul and Oxi for various characteristics made from the results 
of these surveys, assuming an unrestricted random sampling design. 
With a suitable cost function, these variance estimates can be used to 
determine the optimum number of interviewers. Suppose, for example, 
that C (the total survey budget) was $2000, that Q (the cost per family) 
was $2, and that Q (the cost per interviewer for training, supervision, 
travel to the five areas to be enumerated, etc.) was $80. With these 
values (and the cost function C = Q/i + Qm) the optimum number of 
interviewers, m, and the optimum number of families, /?, would be those 
shown in the last 2 columns of Table 1. The analysis would point to the 
use of somewhere between 5 and 8 interviewers for the Jagaddal study 
and to about 5 interviewers for the Nagpur study. It should be remem¬ 
bered, however, that the estimates Sxi are based on 4 degrees of freedom 
for the Jagaddal study and only 3 degrees of freedom for the Nagpur 
study. These estimates are, therefore, subject to a high sampling variance. 
As a matter of fact, the values reported for Sxi are entirely consistent with 
a zero value for Oxi- This situation points to the need for using the 
results of more interviewers if we wish to make reliable estimates of 
interviewer contributions to the variance from the sample. 

If the cost per interviewer, had been taken as $4 instead of $80, 
the optimum number of interviewers for estimating monthly per capita 
expenditures in Jagaddal would have been 49. In this case the use of 
only 5 interviewers would mean an 80 per cent increase in the variance 
of our estimate as compared with the optimum. 

A small experiment similar to those of Mahalanobis was conducted in 
Baltimore by the Bureau of the Census as part of the December 1947 
Current Population Survey. In this study, segments (small areas) were 
selected for interview in the Baltimore area. These segments had an 
expected size of 6 households. The households in 25 of the segments 
were divided into 2 sets of alternate households. Two interviewers were 
assigned to each of the 25 segments and given (at random) 1 of the sets 



297 


Sec. 3 EFFECT OF INTERVIEWERS ON VARIANCE 

of households for interview. Interviewers A and B shared 6 segments,* 
interviewers B and C shared 5, interviewers A and C shared 5, and 
interviewers D and E shared 9. 

The situation in this study is approximated reasonably well by the 
specified mathematical model if we assume that interviewers A, B, and C 
were drawn from one interviewer group and interviewers D and E from 
another. The sample design is, of course, different, but the difference 
requires only minor modifications of the formulas presented above. 


Table 1. Some examples of interviewer covariances and of optimum 
determination of number of interviewers 


Study 

Characteristic 

Estimate 
of inter¬ 
viewer 
covariance 

^XI 

Estimate 
of total 
variance 

Sx 

Optimum number* 
of; 

Inter¬ 

viewers 

m 

Indivi¬ 

duals 

n 

Jagaddal, 

1. Monthly expenditure 





1942 

(rupees per capita) 

1.01 

171.0 

8 

680 


2. Consumption of cereals 






(pounds per head per 






month) 

.13 

100.8 

5 

800 

Nagpur, 

3. Total monthly expendi¬ 





1943 

tures 

.80 

399.1 

6 

760 


minimum variance subject to the cost restriction that In + 80w 

C ~ 2000, 


To determine the optimum allocation of resources for the Baltimore 
study design, we let n equal the number of segments and assume costs of: 

C = total budget = $400. 

Cl = cost per segment (using one interviewer to cover each segment) 

= $ 6 . 

Cg — cost per interviewer = $7. 

Table 2 shows the values of s% and determined from the Baltimore 
study data and the optimum values of n and m with the cost function of 
Eq. 3.15. In the Baltimore study 2 interviewers were assigned to each 
segment. The optimum values n and m were determined for the case in 
which only 1 interviewer is assigned to any segment. 

* To simplify calculations, 1 of these segments was eliminated at random. 






298 RESPONSE ERRORS IN SURVEYS Ch. 12 

It will be noted that Sxi is negative for 3 of the 5 characteristics. 
Negative values of Sxi frequently will be obtained when axi is zero 
(since Sxi is an unbiased estimate of cr^^j) and are particularly likely to 
occur when Sxi is based on a small number of degrees of freedom (i.e., 
relatively few interviewers), making the variance of Sxj relatively large. 
Where Sxi is negative, we have taken ^xi as zero in estimating or| and the 
optimum values of n and m. In these cases, of course, the optimum 
requires that m be as small as possible (i.e., m = 2, the number of 
interviewer groups). 

Table 2. Variance estimates and optimum values of n and m for the 
Baltimore study conditions 

Characteristic to be estimated 


Variance and covari¬ 
ance estimates 


Optimum values 
n 
m 


Variance of x with 
n = 64 and w = 2 
61 and m = 4 
n = 51 and w = 8 
n = 48 and m = 16 
n = 31 and m = 31 


Total 

persons 

Persons 

under 

14 years 
of age 

Total 

employed 

Persons 
employed at 
nonfarm job 
for wages 
or salary 

64.5 

1.04 

4.98 
- .68 

34.3 
- .013 

44.0 

1.28 

58 

64 

64 

56 


Persons 
operating 
own busi¬ 
ness or 
profession 


7 

1.23 

2 

.078 

2 

.54 

.90 

.024 

1.51 

.078 

.54 

1.32 

.024 

1.28 

.080 

.55 

1.01 

.024 

1.24 

.088 

.60 

.91 

.026 

1.39 

.104 

,71 

.96 

.031 

2.08 

.161 

1.11 

1.42 

.049 


For 2 characteristics (totai persons and persons empioyed at a nonfarm 
job for wages and saiaries) there is some contribution of interviewer error 
to the totai variance. For these characteristics the optimum is fairiy 
broad, i.e., for m between 4 and i6 the variance of x wiii be within i3 per 
cent of the optimum. 

4. Use of the specified mathematical model in minimizing the effect of 
both bias and variance. The preceding section indicates a method for 


Sec. 4 MINIMIZING BIAS AND VARIANCE 299 

determining the optimum under fixed essential conditions. In many cases 
where it is evident that a particular survey technique is subject to sub¬ 
stantial response bias, alternative techniques may be available that will 
reduce the bias. We must, of course, consider the relative cost of such 
alternatives. 

Choosing a single sampling design. We may have a choice of alternative 
methods, each with different essential conditions, response bias, and 
optimum values of n and m. For a fixed total cost we can determine the 
optimum values of n and m for each such method. Then the optimum 
method among those examined is the one which gives the lowest mean 
square error. For example, experience in determining farm expenditures 
by direct questioning of farm operators has shown that the results are 
often subject to considerable error. Determining farm expenditures by 
other techniques, such as detailed examination of purchase records, may 
be more accurate but considerably more expensive. We can determine 
the optimum for direct questioning and for detailed examination of pur¬ 
chase records, subject to a fixed total budget, and select the method which 
gives the lower mean square error. The optimum method for one budget 
level may be different from that for another budget level. 

Use of double sampling. In some cases, instead of using a single 
method, a combination of two methods in a double sampling design may 
prove more efficient. For example, we could interview a relatively large 
number of cases (possibly even the entire population) by one of the cheaper 
(and less accurate) methods and reinterview a subsample by one of the 
more expensive methods. Such a double sampling approach is likely to 
be useful in instances where methods with low response bias cost many 
times as much as methods with higher response bias. 

Suppose that our original sample is drawn as described in the previous 
section and we have sampled n individuals and m interviewers {m}^ from 
the hxh group). For this sample, we obtain responses under the 
essential conditions of the initial survey, which we shall designate as 
essential conditions X. For the subsample we take (at random) an equal 
number out of the individuals assigned to each interviewer, giving a 
subsample of n individuals. For the subsample we shall use a set of 
L interviewer groups (which may or may not be the same as the original 
interviewer groups). We draw m' interviewers (m^ from the pth group) in 
such manner that an equal number of interviews can be given to each 
interviewer. It will be noted that the interviewers for the subsample are 
drawn independently of those for the original sample and that w' can be 
less than, equal to, or greater than m. For the subsample we have the 
responses obtained by the original interviewers and we also have 
responses obtained by the second set of interviewers under essential 



RESPONSE ERRORS IN SURVEYS 


Ch. 12 


conditions Z. We may use as an estimate* of the true population mean Y: 


where x = mean of the values for the entire sample of n individuals. 
x' = mean of the Xj,^^ values for the subsample of n' individuals. 
z' -= mean of the values for the subsample of n' individuals. 

Actually, it might be more efficient from a sampling viewpoint to draw 
the m' interviewers for the subsample of clusters as a subsample of the m 
interviewers used for the original sample of clusters. However, the main 
purpose is to reduce the response bias, and this may mean the use of 
better-qualified or bet ter-trained interviewers. Consequently, the second 
set of interviewers may be drawn from a different population of inter¬ 
viewers. 

The mean square error with double sampling.'^ It is assumed that 
interviews under conditions Z are more expensive than under conditions 
X and that method Z has a considerably smaller response bias. For the 
specified mathematical model, the mean square error of 2 ; will be 
approximately 

lU V W\ 

MSE£ = 5| + 2M^+-, + ^ (4.2) 

\n n ml 

where 

Z = Ez' = Ez 
X ^ Ex ^ Ex' 


'^Pxz^x^z ___ ^XI 

XZ X^ 

^1 ~ ^ZI TJ 

Z^ 


Pxz ~ correlation between the expected X and Z values 

fr»r fhp camp inHividiial 


* This estimate z is, of course, a ratio of random variables and is biased but 
consistent. 

I For the derivation of the formulas presented in this section, see Sec. Id. 





Sec. 4 MINIMIZING BIAS AND VARIANCE 301 

An optimum double sampling design. With a combination of two 
methods there is, in general, a set of optimum values for n, n\ and m\ 
As in the preceding section we shall consider only the case where the cost 
function is simple and the optimum values can be determined directly. 
We shall assume the cost function: 


C = Ci/i + C^m + C^n’ + (4.6) 

where Cj = cost per individual under conditions X. 

Cg = cost per interviewer under conditions X. 

C 3 = cost per individual under conditions Z. 

C 4 = cost per interviewer under conditions Z. 

C = total survey budget (excluding any fixed overhead costs). 

Since MSE ofz does not involve m (to the order of approximation used 
in Equation 4.2*) but the cost increases with m, the optimum design would 
call for making m as small as possible. Usually, the minimum number 
of interviewers will be determined by administrative considerations, i.e., 
an interviewer can be expected to complete a certain number of interviews 
a day, and, if the survey results must be available at some specified time, 
we must give an interviewer no more than the number of cases he can 
complete within the time period allowed. If, then, we decide that an 
interviewer shall not do more than tx interviews, the smallest value we 
can give to w is m == and the optimum values of n, n\ and m' are 


where 



(4.7) 

(4.8) 

(4.9) 

(4.10) 


Equation 4.2 is an approximation which ignores terms of the third and 
higher order. Where m, «, m\ or n' is small, some of these terms may be 
appreciable and the approximation to nf may be poor. ^ 





302 RESPONSE ERRORS IN SURVEYS Ch. 12 

Using the optimum values in the equation for MSE of z (Eq. 4.2) will 
permit us to compare a combination of Methods X and Z with either 
method alone or with other methods and combinations to determine the 
Optimum design. 

Estimation of variances and biases. We must, of course, have some idea 
of the costs and of the values of U, V, W, and We can estimate Y 
and Z from a sample (using x as an estimate of X and z as an estimate of 
Z). The variances can be estimated by means of Eq. 3.8 and 3.9, and an 
unbiased estimate of pxz'^x^z provided by 


t _ J____ 

Pxz^x^z ” n ~ I 


(4.11) 


Estimation of the response bias, Bz, is a more difficult problem since 
this error involves the unknown true population mean Y. However, a 
satisfactory comparison of several methods can sometimes be made in 
instances where one is justified in assuming a negligible response bias for 
the method which is considered most accurate and (from previous exper¬ 
ience or a pilot sample study) estimating the differences in expected value 
between this most accurate method and the other methods considered. 

For example, if Method Z is one which is subject to negligible response 
bias, as an estimate of bias for some other method we can use either 

hj.=x—X (4.12) 

bx = x-z (4.13) 

These estimates are, of course, subject to sampling error, and formulas 
for the variances are given in Sec. le. 

Illustration of jointly minimizing variance and bias. To illustrate the 
technique for determining the method which minimizes the mean square 
error we shall use a problem which involves estimating the average dollar 
inventory of a group of retail stores. Let us assume that the population 
consists of all retail stores in a large city and that our budget for the 
survey is $15,000, of which $2500 has been set aside for fixed overhead 
(so that C = 12,500). We shall also assume that the maximum assip- 
ment to an interviewer, tx. is as shown in Table 3. Suppose that pilot 
studies and previous experience give cost, variance, correlation, and 
response bias estimates for five different techniques and that we wish to 
determine which technique (or combination of two techniques) to use. 
Let us assume that the estimates of unit costs, response bias, and variances 
for each technique are as shown in Table 3 and the correlations, 
for each pair of techniques are as shown in Table 4. We shall take 
Y = $ 100 , 000 . 



MINIMIZING BIAS AND VARIANCE 


Table 3. Cost factors, maximum assignments, biases, and variances 
for a study of retail store inventories 


Method 

Unit costs 
(dollars) 

Maximum 
assignment 
per inter¬ 
viewer 
ix 

Response bias 
(thousands of 
dollars) 

Bx 

Square root of vari¬ 
ances and covariances 
(thousands of dollars) 

Per 

store 

Q 

Per 

inter¬ 

view 

Q 



1 


25 

100 

1 ----- 

- 11.0 

83 

25 

2 

2 

50 

60 

- 6.0 

80 

is 

3 

6 

100 

40 

- 2.5 

76 

10 

4 

12 

150 

35 

- .8 

73 

9 

5 

20 

150 

35 

- .6 

71 

6 


Table 4. Cor relation between expected values of individual responses (p 

Method 


Method 



1 

2 

3 

4 

2 

.79 




3 

.82 

.84 



4 

.84 

.87 

.91 


5 

.85 

.88 

.92 

.95 


Table 5 shows the mean square error which would be obtained for each 
method and each combination of methods, using with the single sampling 
method the values of n and m given by Eq. 3.16 and 3.17 and with the 
double sampling method the values of n\ and m' given by Eq. 4.7 4 8 
and 4.9 with m ~ njtx^ ' ’ ’ 

The optimum, if only a single method is employed, is to use Method 4 
with n = 723 and m - 25. However, double sampling permits a further 
reduction of 35 per cent in the MSE by using Methods 1 and 5 with 
n — 3480, m — 35, n ~ 382, and w = 21. In many cases double 
sampling will not give gains of this magnitude over a good single sampling 
method. It should be noted that the figures in Tables 4 and 5 are 
hypothetical and are used only to illustrate the methods. 

In situations of this type it may frequently be necessary to increase 
expenditures per unit many times in order to reduce the response bias 
from 10 per cent to 2 per cent. For example, the A. C. Nielsen Company, 





304 RESPONSE ERRORS IN SURVEYS Ch. 12 

which compiles data on sales of commodities by retail stores, has found 
that it can obtain sufficiently accurate reports on sales only by personaUy 
checkine physical inventory and purchase invoices. As another example, 
the Bureau of the Census reinterviewed a sample of respondents, using 
in the reinterview professional personnel from the Washington office. 

Table 5. Comparison of minimum mean square errors for five alternative 
methods (and combinations of them) 


Methods 

n 

m 

n 

1 

12,022 

380 

— 

2 

3,197 

122 

— 

3 

1,350 

44 

— 

4 

723 

25 

— 

5 

508 

16 

— 

1 and 2 

5,242 

52 

1,869 

t and 3 

4,382 

44 

883 

1 and 4 

3,582 

36 

503 

1 and 5 

3,480 

35 

382 

2 and 3 

1,521 

25 

690 

2 and 4 

1,340 

22 

392 

2 and 5 

1,329 

22 

302 

3 and 4 

703 

18 

260 

3 and 5 

718 

18 

201 

4 and 5 

481 

14 

129 


m 


123 

50 

31 

21 

41 

27 

18 

23 

16 

14 


Minimum mean square 
error for the indicated 
method 


123.2 

39.8 

12.7 
11.1 
12.5 

39.7 
11.2 

7.6 

7.3 

13.8 

10.1 

9.5 

13.1 

12.0 

15.7 


The average cost per interview was of the order of 7 times the average cost 
for the original interview, and there was a significant increase in the 
accuracy of certain items such as coverage of persons. On the other 
hand in some cases increases in expenditure may yield only small gams 
in accuracy or large gains for some items and small gams for others. In 
the same study by the Bureau of the Census, the per cent distribution 
into 10-year age groups, for example, was practically identical for both 
interviews, with none of the 10-year age groups differing by more than 
1 of 1 per cent from the original interview to the reinterview. If me 
primary aim of the survey were to obtain an accurate per cent distribution 
bv age the more expensive method would not be justified. 

^Frlquently, it is also possible for very small increases m expenditures 
to produce large gains in accuracy. In one instance, for example the 
Bureau of the Census in its Current Population Survey had been getting 



Sec. 5 UNCORRELATED AND COMPENSATING ERRORS 305 

a large number of persons erroneously reported as not in the labor force. 
A revision of the questions asked added nearly 2 million of these “missed’’ 
persons to the labor force (2). The revision added practically nothing to 
the cost of the survey. 

Thus, there is no “typical” relation between cost and accuracy. Each 
survey presents its own picture. The method outlined is general in its 
applicability, although the answers obtained will vary. 

It should be noted that the work of determining the “optimum” design 
can frequently be shortened by eliminating from consideration alternatives 
which are obviously inefficient. For example. Method 5 in the illustration 
above involves a cost per store two-thirds greater than that of Method 4, 
but the response bias for the two methods differs by only a trivial amount. 
The higher expenditure per unit in Method 5 improves the values for 
individual stores, but the individual response errors of Method 4 are 
largely “compensating” in nature. Although the combination of Methods 
1 and 5 gives the lowest MSE, the result does not differ appreciably from 
that for Methods 1 and 4. Thus, consideration of both Method 4 and 
Method 5 was really unnecessary in selecting an optimum, 

5. Effect of uncorrelated and compensating response errors. A con¬ 
sideration of the specified mathematical model leads to the conclusion 
that response errors that are uncorrelated with each other and compen¬ 
sating in character do not necessarily need any special attention in survey 
design whenever the purpose is to estimate a mean or total for the total 
population or for a subgroup when the members of the subgroup are 
identified without error. Furthermore, in this case, the formulas pre¬ 
sented in previous chapters for estimating sampling error reflect the re¬ 
sponse error properly and no' special attention need be given to the 
presence of response errors. This situation is, however, often assumed 
to exist without valid evidence. It is not at all uncommon for the results 
of the survey to be justified on the basis that “some of the errors were 
positive and some were negative, so that the net effect is undoubtedly 
close to zero.” Of course, it is not possible to assume that just because 
there are both positive and negative errors their effect is necessarily 
compensating, and such an assumption can very often lead to erroneous 
conclusions. Moreover, as we have already seen, if the response errors 
are correlated with each other (as within the work of a single interviewer), 
the variance is increased and the chances of errors being compensating 
are reduced. However, let us consider the situation where there is 
evidence that the errors are, in fact, compensating and uncorrelated. 

The essential points can be seen more easily with the very simple 
situation in which the response errors are uncorrelated for any two 





306 RESPONSE ERRORS IN SURVEYS Ch. 12 

individuals in the population and a random sample of individuals is 
drawn without restriction and with replacement. The variance of a 
sample estimate of the true population mean under these conditions is 

4 = ^ (5.1) 

n 

Since Y,,- = (where E,- is the true value of the characteristic for 

the ah elementary unit of the population, and is the response error on 
the yth measurement for the /th elementary unit), we can express Ox 

= o'r + + ^Pyr^y^r (^‘^^ 

where a\. = variance of the true values. 

(y% variance of the response errors, which is composed of the 
variance of response errors for an individual around the 
individual’s expected response error and the variance in 
the expected value of response errors between individuals. 

— correlation between the and the Note that 

PybPy^r = Pyr^y^r^ where pyE is the correlation between 
the Yi and the R,-, the expected response error for the /th 
individual. 

Response errors are reflected in usual sample variance. We see from 
Eq. 5.2 that a\ reflects any effects of the responses as well as the variance 
of the true values. Similarly, if we estimate a\ by 

„2 ^ J- 

(«-l) 


the effect of the response errors will appear in the estimated variance, 
since the expected value of s\ is equal to (y\. Consequently the estimated 
variance of the sample mean will include appropriate allowance for the 
response variation. 

Reducing the ejfect of response variation. If we assume a fixed total 
budget (after deducting fixed overhead), C, then the number of cases 
which can be sampled is equal to C divided by the unit cost Cx‘ Thus, 
we have 


Let us compare the results of Method X (which gives the response X^) 
with those of Method Z (which gives the response Zfl assuming that 
Method Z has a unit cost > Cx. that and that Rz = Rx 



Sec. 5 UNCORRELATED AND COMPENSATING ERRORS 307 
= 0. Then, Method Z is preferable to Method X only if 


or if 


<4 


(5.4) 


G 

a 



(5.5) 


Inequality 5.5 provides a test of the relative efficiencies of two data- 
collecting techniques which have either no response bias or the same 
response bias. Let us consider a hypothetical example. Suppose that 
we have a choice between two methods of estimating a characteristic, both 
methods using a simple random sample of families: 


For Method X 
= $2 

= 10 

Ptb^ = .1 


(Tr = 10 


For Method Z 
Cz= $5 
2 

Pyr^^ .1 


Since both methods have the same response bias, Method Z will be 
more efficient than Method X only if Inequality 5.5 holds. Usinff the 
figures just presented gives 


= 1 = ^U.49 

220 

Thus Inequality 5.5 does not hold, and we gain more by putting our 
funds into the larger sample permitted by the lower unit cost of Method X 
than by putting them into the reduction of response error permitted by 
Method Z (even though this reduction in response error is quite sub¬ 
stantial). It appears that ordinarily any appreciable increase in expendi¬ 
ture to decrease response variation will be unwarranted if there is no 
effect on the response bias. 

If we are estimating the proportion of the elementary units having a 
given characteristic, any expenditure to reduce response errors will be 
wasted whenever the response errors are compensating so that Z -= X 
since, in this case, 

4 = 4 = ^(i - ^) - Z(1 - 2) 

As a practical matter, ffie difficulty is that there is rarely any assurance 
that response errors are, in fact, compensating, and it is not safe, without 
extensive investigation, to concentrate on sampling variance and assume 
that response errors are of no importance. Even if we were to reduce 
sampling error to negligible proportions by taking a complete census, we 
might still have a substantial MSE because of response bias. 







308 


RESPONSE ERRORS IN SURVEYS 


Ch. 12 


6. Applicability of the specified mathematical model. The analysis 
presented above applies, of course, only when the conditions of the 
specified mathematical model apply. It is important, therefore, that we 
examine these conditions in terms of the situations actually prevalent in 
typical surveys. With regard to the selection of individuals or other 
sampling units for interview, it is feasible to use techniques (e.g., selection 
dependent upon a table of random numbers) which give random samples. 
However, a determination that individual responses behave like random 
variables will require much more experimental evidence than is now 
available, and will necessarily be subject to some question. 

As previously noted, the conditions which determine the response of 
any individual may be regarded as divided into two groups: 

(a) Those conditions which are “constant,” “controlled,” and pre¬ 
determined for a given individual response, e.g., the questions to be 
asked, the type of interviewer. We have referred to these as the 
essential survey conditions. 

(b) Those conditions which are adventitious and “unpredictable,” e.g., 
the mood of the respondent, a momentary distraction which results 
in a question being misunderstood. 

This division is similar to the division between “assignable (i.e., con¬ 
trollable)” causes and “residual” causes of variation in discussions of 
quality control. We have treated these two groups of factors in the 
same way as they are treated in the quality control field. Thus, we 
consider the “adventitious” factors as giving rise to a random variable, the 
response obtained for a given individual being one of the values of this 
variate. The “controlled” causes would determine the expected value of 
this random variable. They also affect its variance. 

It should be noted that the present analysis does not provide for 
measuring the response variance of a single individual apart from the 
variance between individuals. Such measurement would be feasible 
experimentally were it not for the conditioning of the respondent. But, 
in practice, repeated interviews on a single respondent are not independent. 
Thus, a direct test of the random nature of individual response variation 
cannot be made by using the specified mathematical model. We can, 
however, determine the total variance and, by applying the specified 
mathematical model in a large number of cases, test its approximation to 
actual conditions. 

A further assumption, which can be accomplished in fact, but ordinarily 
is not, is that the available sample individuals are assigned to interviewers 
at random and that, within an interviewer group, the selection of inter¬ 
viewers is independent of the selection of individuals. The survey 



Sec. 7 DERIVATIONS AND PROOFS 309 

supervisor will usually try to arrange an interviewer’s assignment to 
minimize travel costs rather than making up random arrangements, and 
definite steps must be taken to make it possible to measure the variable 
contribution of the interviewers. More experimental work in this area 
is needed. The implications of this entire problem need thorough 
exploration, and the analysis presented in the present chapter can 
be considered only a step toward a systematic treatment of response 
error. 

It was indicated earlier that the results presented were applicable to the 
estimation of aggregates or averages for a total population, or for sub¬ 
groups of the population provided that the assignments to subgroups are 
made without error. If the identification of whether a unit is or is not 
a member of the subgroup involves response errors, response bias usually 
will be present in estimates of subgroup means even when estirnates of 
the population mean are unbiased, unless the errors involved in identifying 
a unit as a member of the subgroup are independent of each other and of 
the characteristic to be estimated. The effect of errors of measurement 
on correlations has sometimes been considered, but the situations dealt 
with ordinarily have been restricted to the case where errors are independ¬ 
ent and ‘attenuate” the correlation. The effect of correlated errors 
requires much more attention than it has been given but is beyond the 
scope of the present discussion. 

7. Derivations and proofs, a. Description of the population. The 
mathematical model used in this chapter for the analysis of response 
errors assumes a population divided into N units. The unit may be an 
elementary unit or it may be a “cluster” of elementary units (e.g., a 
household or a group of households living in an area). The N units are 
divided into L “groups” with A,, units in the hth group. There are Mj^ 
interviewers available to interview the units in the hth group (and only 
these units). On any particular interview of a “unit” by an interviewer 
a response occurs. It is assumed that this response is a random variable 
for any interviewer and any respondent; i.e., a given response is con¬ 
sidered to be only one of the possible responses which might be obtained 
from this respondent. Let 

P-hiiic = the probability that the response value will be obtained 
if the ith interviewer in the hth group interviews theyth unit. 

^hijkuvw the probability that responses X;.,,., and are obtained 
(in the hth group) if the zth and wth interviewers interview 
the yth and ^^th units. 




310 

We have then 


RESPONSE ERRORS IN SURVEYS 


Ch. 12 


i j i j k i j 


k 


where the sum is taken over all possible responses of theyth unit to the 
ith interviewer. 

We also have 

2 '^^hnkuvw “ '^^hiik ~ ^^huvw ^ 

A: IV k w 

^hijuvi^hijk^huv-w) “ 2 ^^hijkuvtv^hijk^Jiuvw (2-4) 

where the sum is taken over all possible responses of theyth unit to the 
Ath interviewer and of the t-th unit to the wth interviewer, and ^hiouv 3® 1 ^® 
conditional expected value for fixed interviewers i and w, and fixed 

respondents j and v. We shall assume that, if / = 7 ^ w and j ^ v (i.e., if 

both interviewer and respondent are different), the responses obtained are 
independent. Thus, if i ^ u and j ^ v, 

p — P P (2.5) 

^hijkuvw ^ hijk^ huviv ^ ■' 

^Jiijuv^hijk^huvw ^hij^huv (2.b) 

b. Contribution of interviewer errors to the variance of a sample estimate 
from a simple random sample. Let us suppose that the sample design 
calls for drawing units from the population as a whole and drawing 
interviewers independently within each group. These selections are 
made at random. To each sample interviewer drawn from the /zth group 
we assign, at random, a certain number of the units in the sample from 
the hti\ group. We fix in advance: 

( 1 ) n -- the total number of sample units to be drawn. 

(^ 2 ) — the number of sample units to be assigned to each sample 

interviewer from the hx\y group. 

Since the n units are drawn independently of the groups, the number of 
units falling in the sample in any group is a random variable. Let us 
designate the number of units in the sample from the hih group by 7 ?^. 
Since we fix the number of sample units per interviewer in the hx\v group 
at fij,, the number of sample interviewers to be drawn in this group will 
be mj^ = njfij,, and m^, will also be a random variable. 

Let 

= a particular response obtained for the /th sample unit when 
interviewed by the fth sample interviewer drawn from the hx\\ 
(population) group. 





Sec. 7 


DERIVATIONS AND PROOFS 


L Wh Tlh 
h i j 

n 


311 


(7.7) 


The summation for k is omitted because only one response is recorded 
for any individual in the sample. 

Contingent upon drawing a unit from the hih group, the probability that 
Hijh = ^hiik ^ 7^ hm^ and there are values of Thus 


where 


^^hi^h^hijk) 

-- 

n 

Mk N„ Mf, Nn 

2 2 'X^hijk^hijk 2 2 ^ 7«7 

Y ^ ^ a k __ _ i j 


MuNn 




(7.8) 

(7.9) 

(7.10) 


Therefore 

Since 


To evaluate 




Ex 


Ehinj) 


nN, 

N 


Ex 


_ L \ Mn Nk 

IN-A Ijflix,, 


h I 3 


N 


N 


( L fih ^ 

2 2 'X^hijk 

Ji - i-i -_ X 

n I 

! L mu nh \ 2 

/ 2 2 ^^hiJk \ 

al = E\>L-i-^ - -X 


(7.11) 


(7.12) 


(7.13) 


we shall make repeated applications of Theorem 15 of Ch. 3 (p. 65). 
First, let us assume that condition is that the sample of households 
and the sample of interviewers assigned to the households are fixed. 
Then, from Theorem 15, 




,2 

mm) 


(7.14) 







312 

Now, since 

it follows that 

2 

^X\b: 


RESPONSE ERRORS IN SURVEYS 


/ L mn ni> y 

£ - \b. 


L mu Uh _ 

2 2 

h % 3 _ 

n 


Ch. 12 


(7.15) 




IL mh nK L mt. ^ \2 

2 2 2^.0. 2 2 

h i j _ h J__i -I 


n 


n 


\ L 

2 2£« 

n ji % 


Uh 




r I 


1 L mn Uh _ 

= -,22 
« h i j 

+ -i 2 2 2 EunA^MiU- ^Mv) U.16) 

n h i j¥^v 

where represents the conditional expected value for the interviewers 
fixed, Ef,ij represents the conditional expected value for zth interviewer m 
the sample and yth household in the sample, represents the con¬ 

ditional expected value with the yth and t^th sample households fixed. 
Then, Ea%^, where 4 , 5 ^ is given by Eq. 7.16, is 

Ea|„,=A^ 2 '”‘ "" 


k M^N^i i 


2 ^E!j^^j(Xj^^jJ(; ^hij) 


H- 2 E 2 2 2 - x^iv) 

h i 3-^v (7.17) 


and since 


we have 


Enij^fib, = En, 


N 


\ L \ Mk Nn 

E<r%w lE,A^nm - 


-Zl % 3 


3 L fj _ ] ■{ Mh Nn ~ w 

-2 2 ^—7 vr 2 2 E,,U^nm - 

nN h Nf, — \ Mj, i j^v 


Xb,.) (7.18) 


To complete the evaluation of g% we must evaluate second 

term of Eq. 7.14. 



Sec. 7 DERIVATIONS AND PROOFS 

Now E{x\b^ is given by Eq. 7.15 and 

\ 2 




j L mu hk \ 


313 


(7.19) 


We can again apply Theorem 15, Ch. 3 (p. 65), to Eq. 7.19 with the con¬ 
dition *2 that the interviewers in the sample are fixed. First, let 


X = 


L mu n^ 
h i ?•_ 


n 


Then 


Now, from Theorem 15, 


Since 


where 


E{x\b,} — 




--h 

X wx 

2 IfiAi 

E{x'\b^) = - 


Nn _ 




N, 


then 


O'.! 




IL ruh Uh L nih \ 2 

2 22^^^ 2 2«aA 

-Aj-/1 

\ n n I ' 


\ L iyik r Uh 

^ 2 2^/i?* ~ 

^ h i Li 


(7.20) 

(7.21) 

(7.22) 


(7.23) 




I L m, — ^hif 


J X mft M M — 1 iV^ 

+ 3 2 2 TT —I 2 (^.« - 


n^7rN,N,-li%' 


Em^n,=^En, = n^ 


hiv 




and since 



314 

we have 


RESPONSE ERRORS IN SURVEYS 


Ch. 12 


1 L 1 M _ 1 Ma ISh _ 

+ ^ 2 2 2 a-24) 

Nn h M^N^-\ i r^v 

and to complete the evaluation of (t|, we must find in Eq. 7.21. 

From Eq. 7.22 

L nih 

I 

E{x'\bi) = 






If we now let E{:x'\b.^ = x, we can again apply Theorem 15, with the 
condition b^ that the number of interviewers is fixed. Then 

_2 77-2 I _2 t7 77t 


E{x"\b^ 


ri&3 + 

(7.27) 

Mn L 

i h 

(7.28) 


( L iJih _ L _ \ 2 

llnAi lfiA\ 

— - - -/ 1 h 

n n / . 


\ L fwft 

-,1EA HE, 

rr h Li 


= (7-30) 

h i 

since E(X^,- X^Xf,,,- X,) = 0 when the interviewers are selected 
independently (i.e., with replacement). 

Now, since ^ 

= Enj, = n-— 


--- 

nN h 


(7.31) 



Sec. 7 
and 


DERIVATIONS AND PROOFS 


^l(ri&s) ~ ^ 



315 


(7.32) 


" F* ft - 5)+4^- ft - ft “ 5) <'“> 


h \n iv / 

Now, to evaluate Eq. 7.33, we have for the first term 

A '* In nJ (N- l)n t * AT \* n) 


N—n 


L 

yxi 


N~ n E NI 


(N-l)nf "A (A-l)/ 2 l 
To evaluate the second term of Eq. 7.33, we first let 


2 -^ h 


A 2 


n A 

The second term then is 


== n 


hy 


, rij Nj 

and —- ^ ^ n, 

n A 


Since 


L _ L _ 

E 1 X„h^X,h, = 2 EX^n^Xfi, 

h F-j h 

= 2E{EX^Xfi,n,\n,) 

h ^3 


L 

Ik 

h 


IK - 0 

j 


the second term of Eq. 7.33 is equal to 




Nr 


(A- N N~ 


(N-l)n,Ur''^'' N 

r / L 

N-n 


A-A. 


(A- 1)/? L A2 


(f*4 


h 


N'i 


r, -V-. F--^- 


(AT- 1 )« 


(AT- l)n jV2 




316 


RESPONSE ERRORS IN SURVEYS 


Ch. 12 


Combining the evaluation of the first term of Eq. 7.33 with that of the 
second term, we find that Eq. 7.33 is equal to 


N-n 


'iNun 


{N- \)n \ N 

and for N large relative to n, we have 


- 


We finally have, collecting terms. 


2N,(X^-Xf 

h _ 

nN 




(7.34) 


(7.35) 


where the terms in Eq. 7.35 are defined by Eq. 7.18, 7.24, 7.31, and 7.34. 
Equation 7.35, after considerable algebraic manipulation, reduces to 

-M, Ni 
, i ;y- 2 

= - 


nt N 


MnN, 


, 1 frn - l)^-4-iit! 


7 V I, „■ V 0) 


MMN,- 1) 


Now let 


Mft Nu 


L N . . 

N- h % j _ 

f N M,N, 




We note that 


(^hXI 


M, Nh 

2 2 ^hi},vi^}mk ^h)(^hivw 
i j¥=v _____ 


(7.36) 


(7.37) 


(7.38) 


is the average covariance among the responses for a given interviewer. 
Then Eq. 7.36 can be written 


r|=-Yi + 

^ M ^ n 


1 ^Wh^hXI 

1 


N 


(7.39) 


where o\ is given by Eq. 7.37 and Oj^xi by Eq. 7.38. Assuming 


n„ = - = n 
m 




Sec. 7 DERIVATIONS AND PROOFS 

for all h, Eq. 7.39 becomes 




317 


(7.40) 


where 




L 

N 


c. Estimates of variance from sample results. Let 

W/v rih nih 

2 ifefi - ^hif 


^hX — 


mm 


To prove: 
Proof 




m 


^^Ix — ^Ix — ^ip^hi. 


hijk 


h 


m.H 


ES-J ---- = £-? i £-i— 


Since 




m„n, 


m, 


(7.41) 

(7.42) 

(7.43) 


Ex, 


and Exit = -f Xl 


Eq. 7.43 = alx + Xl - /n„o-| - 


= Ohx - 


Moreover, from Sec. 2, Ch. 6, the expected value of the 

Eq- 7-41 is 

2fef - 


E^ 


Therefore, 
To prove: 
where 


«»-1 


= mm 


h Sh 


Eslx ~ ^hX 

^^hXl ~ ^hXI 

m,h Mh rih 


^hXI — 


4 


"1*- 1 


Proof. From Sec. 2 of Ch. 6 


2fei - 


m. 


- 1 ) 




(7.44) 
last term in 

(7.45) 

(7.46) 

(7.47) 

(7.48) 


(7.49) 






318 

and 


RESPONSE ERRORS IN SURVEYS 


Ch. 12 




Sec, 7 


DERIVATIONS AND PROOFS 

i 




«o W- 

< t |- -X^ 


n~\t N ' n-\ N «- I * h-1 


n AN; 


2^4 + 


2n,(a,-;P)2 


n- 1 r N 1 


N 


«— 1 


Since 


L 

Inu^h 


then, by Theorem 15 of Ch. 3 (p. 65), 


al=:E~ 


*2 


-{-E 


\ n 

Since, as shown above, Etij, = n{NJN\ it follows that 


IL 


i' 

I ^ A_„_ 

Nn 


We have 


Therefore 


I«i4 

ol = ^ 


n‘ 


Nn 


319 

(7.60) 

(7.61) 


(7.62) 


(7.63) 


(7.64) 


|"'*(^^-")' « AN 


-- —2— V ^2 j_ii_ 

1 n— if N 1 


j,N,(x^-xy 


N 


1 7 , ^N^i^,~Xf 

E Inhl - -~r ;r- (7.65) 


Now 


n(n - 1) t 


Nin - 1) 


7 - Xhf 

Jnkin - nn) ^ -j— 


n(n~ 1) 


n(n— 1) 

2«.«4 


= E 




n(n~l) n(n-~l) 


(7.66) 









Sec. 7 


DERIVATIONS AND PROOFS 


.2 _ 

— 


I."h4x 


n 


s 


2 

bX 


321 


L mn n 

2 2 '2Sp^hij 

h i j 


n 


+ 


y 

nm^- 1 


mn 


% 

m 


+ 


2 «.(% - xf 2 « ^ 

^ /i ntf 


;-7 lipCm-Xj^y 

h ■•■ i 


n — 1 


n{n — 1) 


L rrih n 

1 2 2 (%« - - , 

h i i , n 1 

^ 


n- 1 


fi-~ 1 


L mn n 

2 2 I,(Xkij — xf 

_ i _ j _ I ^ ^ 

n(n — 1) 


(7.74) 


(7.75) 


d. Variance of a mean with a double sampling design. This section 
considers a case where n individuals are drawn at random, and froi|i these 
n individuals, n individuals are assigned at random to each of m sample 
interviewers {m^, interviewers from the hih group). For this sample we 
obtain responses under essential conditions X. From the n sample 
cases assigned to each interviewer we subsample at random cases, 
giving a total subsample of n' ~ mn . We also have a set of L' interviewer 
groups (which may or may not be the same as the original interviewer 
groups). Of the subsample cases n^ are available for interview by inter™- 
viewers in the pth of the V interviewer groups. We draw iWn In', 
where m' is determined in advance) interviewers from the / 7 th interviewer 
group and assign at random to each of these sample interviewers 
n{= n'lm') individuals. The second set of interviewers obtains responses 
under essential conditions Z from each of the n' individuals in the 
subsample. We use as an estimate of f, the true population mean, 


where 






xz 

x' 


L mn n 

2 2 

h i j 


(7.76) 


X ~ 


n 


(7.77) 






RESPONSE ERRORS IN SURVEYS 




L rrih n' 

2 2 2 a;«i 


Ch. 12 


(7.78) 


L' m'p n" 

2 2 2 *mj 

V Q 3 


The expected value and variance of x have already been derived. The 
expected values and variances of x' and z' are identical with the values 
for a random sample of n cases drawn without reference to the sample 
of n cases from which x is calculated. Thus 


Ex' = X 


L 1 M^Nn 
2 —2 2^.i7 

h Mj^ i j 


(7.79) 


where X^i^ = the expected response value for the yth individual in the 
population interviewed by the ith interviewer in the hth 
group under essential conditions X, 
and 

L' 1 M'„ N'p _ 

2 77^ 2 2^s)a; 


^ V ^ i 

=z =- — 


(7.80) 


where ^ the expected response value for the ;th individual in the 
population interviewed by the ^th interviewer in the />th 
group under essential conditions Z. 

Using the usual approximations to the expected value and variance 
of z gives 

ri ^ ^ ^ ^ 2 (7.81) 


4 = 

Since (T^j. = ar|, 

<^1 = 

and 


_ . (ExXEz') 

Ez = -:— 

_S_Z 

(Ex') 

X 

+ ^1' ~ 

^4 2(a 

x^ 

■^22 


,2 _ 72 \ 4'-4 , £l _ 2(g^-yj- Otr) 
J 22 22 xz 


(7.82) 

(7.83) 


MSE z = iZ- Yf + ai (7.84) 

It will be noted that the bias of z is approximately (Z— Y). Thus, if 
2 is closer to Y than is X, z may be a better estimate of Y than x, even 
when the variance of z exceeds the variance of x. 


Sec, 7 

DERIVATIONS AND PROOFS 

323 

As previously noted, the variances of x' and are the same as the 
variances when the subsample of n' is drawn independently of the sample 
of n. Thus 

9 ^XI ^ XT 

<^1 = - „ + -Ai (7 35^ 

n m ^ 


^ n' m' 

(7.86) 


2 ^X ~ ^X1 , ^XJ 

~ - 

n m 

(7.87) 

We also have 

m'Exz' = n’E{x^„z,„) + n'{n- 



= n'E{x„ffj^i) + n'(n — l)X 2 

(7.88) 


ExE - ~ + XZ 

n 

(7.89) 


rr 

n 

(7.90) 

where 

N 



<^xz = E(x„,f^J -Xz = - XZ 

N 

(7.91) 

and 

{n'fEx'E = n'Ex^„z„, + \)Ex^,,z^^, 

(7.92) 


= + n\n' ~ \)XZ 

(7.93) 


„ ^xz 

n 

(7.94) 

Thus 

\n n m / 

(7.95) 

where 

IT . '^^xz ^x~ XI 

XZ A2 

(7.96) 


_ _ (Sy 0” 

F= " 

(7.97) 


b, 

w=~ 

22 

(7.98) 



324 RESPONSE ERRORS IN SURVEYS Ch. 12 

Unbiased estimates from the sample of have been derived 

above. Estimates of or| and Ozi will have the same form as those for 
and a^i, using the values Zj„.. As an unbiased estimate of a^z 
we have 


i _ 

n'-l 


(7.99) 




— ^XZ 


( 7 . 100 ) 


e. Variance of estimates of response bias. We have considered the case 
where samples are drawn as described in Part d (“double sampling 
design), and it can be assumed that 2 = ?. In this situation estimates 
of the bias of x as an estimate of Y are 


= (7.101) 

and 

Thus hx is an unbiased estimate of the response bias Sx X— 2)* 
The “ratio” estimate bx is a consistent estimate of Sx> For the variance 
of bx we have 

<r|^ = “ 2(7^2' (7.103) 

Substituting values obtained above for a% cr|, and gives 

2 _ <^x-Oxi , <^xi , g|-g_z/ I (7.104) 

fi m n' m' n 


For the variance of we have 


and 

Thus 



(7.105) 

(7.106) 

(7.107) 


(7.108) 


Sec. 7 


DERIVATIONS AND PROOFS 


325 


REFERENCES 

(1) Russell L. Ackoff and Leon Pritzker, “The Methodology of Survey Re¬ 
search,” Intern. J, Opinion and Attitude Res., 5, No. 3 (1951), 313-334, 

(2) Gertrude Bancroft and Emmett H. Welch, “Recent Experience with Prob¬ 
lems of Labor Force Measurement,”/. Amer. Stat. Assn., 41 (1946), 303-312. 

Edwards Deming, “On Errors in Surveys,” Amer. Social. Rev. ’, 9 (1944), 

(4) Morris H. Hansen and William N. Hurwitz, “The Problem of Non-Response 
m Sample Surveys,” /. Amer. Stat. Assn., 41 (1946), 517-529. 

(5) P. C. Mahalanobis, “Recent Experiments in Statistical Sampling in the 
Indian Statistical Institute,”/. Roy. Stat. Soc., 109 (1946), 325-370. 

(6) Eli S. Marks and W. Parker Mauldin, “Problems of Response in Enumera- 
tive Surveys,” Amer. Social. Rev., 15 (1950), 649-657. 

(7) Eli S. Marks, W. Parker Mauldin, and Harold Nisselson, “A Case History 
in Survey Design: The Post-Enumeration Survey of the 1950 Census ” / 
Amer. Stat. Assn., 48 (June 1953). 

(8) Gladys L. Palmer, “Factors in the Variability of Response in Enumerative 
Studies,” /. Amer. Stat. Assn., 38 (1943), 143-152. 

(9) Morris H. Hansen, William N. Hurwitz, Eli S. Marks, and W. Parker 
Mauldin, “Response Errors in Surveys,” /. Amer. Stat. Assn., 46 (1951), 





Index 


Accuracy, 8 

Ackoff, Russell L., 281, 325 
Aggregates, effect of variation in size 
of cluster on estimates of, 203, 
204 

estimated from time series, 270, 275 
estimates of, 194 
variance of, 116 
Attribute, 3 

Bancroft, Gertrude, 325 
Bias, 46 

compared with standard deviation, 
113 

contribution to mean square error, 
51 

of approximation to variance of ra¬ 
tio estimate, 109 
of ratio estimate, 112, 126 
Binomial distribution, 105 
Binomial theorem, 77 
Bose, Chameli, 279 

Census, Bureau of the, 176, 204, 249 
Change, estimate of, 269, 271, 274, 
276 

Changes in probabilities, adjustment 
for, 234 

Chapman, R. A., 176 
Characteristic of a population, 2 
Cluster sampling, 5, 31, 142, 177, 194, 
208 

effect of homogeneity within clusters, 
161, 168, 222 

effect of unequal sizes of strata, 215 
effect of variable sizes of cluster, 
161, 203, 204 

estimates of components of variance, 
158, 181, 228 

estimates of total variance, 151, 153, 
180, 216, 218, 246, 293 


Cluster sampling, gains with stratifica¬ 
tion, 185, 227 

interviewer errors reflected in vari¬ 
ance, 293 
listing unit, 143 
notation, 142, 205 

optimum allocation, see Optimum 
allocation « 

precision of variance estimates, 243 
245 

ultimate clusters, 156, 165, 293 
variance, components of variance, 
rel-variance, and covariance: 
contribution from each stage of 
sampling, 182 

for more general design, 208 
for simple one- or two-stage sam¬ 
pling, 144 

for stratified one- or two-stage 
sampling, 177 

Cochran, W. G., 120, 141, 176, 193 
204, 279 

Coefficient, of correlation, 55, 96, 122 
of variation, 50 

of estimated standard deviation 
104 

of ratio of random variables, 109 
rel-variance of estimated, 237 
Combinations and permutations, 36ff. 
Complementary event, 16, 28 
Components of variance, see Cluster 
sampling 

Conditional expectation, 59, 60 
Conditional probability, 24, 59 
Conditional variance, 63 
Conditioning of respondents, 284 
Confidence limits, for estimated stand¬ 
ard deviation, 104 
for medians, 247 
for order statistics, 247 
for ratio estimate, 109 
Consistent estimates, 72, 74, 75 120 
327 





328 


INDEX 


Convergence in probability, 72, 107 
Cornfield, Jerome, 262 
Correlation coefficient, 55 
of sample means, 96, 122 
Cost functions, see Optimum allocation 
Covariance, 51; see also Variance 
conditional, 63 
of linear combinations, 57 
Cramer, H., 38, 42, 89 

Dalenius, Tore, 141 
Deming, W. E., 120, 141, 176, 281, 
325 

Dependent random variables, 42 
Difference estimates, 250, 272 
Double sampling, compared with sim¬ 
ple random sampling, 256 
to reduce response bias, 299, 321 
with regression estimates, 254 
with stratification, 257 
Double summation, 11, 13 
Duplication of a subset of elements, 
139 

Element, 1, 5, 6 
Elementary event, 15 
Elementary unit, 3 

Equal probability selection method, 17 
Epsem, 17 
Estimate, 7 

Estimating variances, see Variance esti¬ 
mates 

Evans, W. Duane, 262 
Event, 16 

Exhaustive events, 28 
Expectation, 44 
conditional, 59 

Expected response value, 284 
Expected value, 39, 44 
intuitive meaning of, 45 
theorems on, 46-69 

conditional expected values, 61 
conditional variance and covari¬ 
ance, 63 

linear combination, 49 
product of independent variables, 
54 

sum of random variables, 48 
variances and covariances, 56, 57 


Feller, W., 38, 42, 89 
Fieller, E. C., 109, 120 
First-stage sampling units, 5 
Fourth moment (^^ 4 ), 99 
Frame, 1 

Frankel, L. R., 141 

Ghosh, M. N., 175 
Goursat, E., 132 
Gurney, Margaret, 109, 141 

Hansen, M. H., 204, 235, 249, 279, 
280, 325 

Hedrick, E. R., 132 
Horvitz, D. G., 204 
Hurwitz, W. N., 204, 235, 249, 279, 
280, 325 

Independent events, 23 
Independent random variables, 41, 54 
Indian Statistical Institute, 295 
Interviewer effect on variance, 288, 
310 

Interviewers, optimum number of, 294 
Intraclass correlation, 164; see also 
Measure of homogeneity 

Jebe, Emil H., 235 
lessen, R. J., 175, 193, 279 

Kendall, M. G., 33 
Keyfitz, Nathan, 234, 235 
Kurtosis, measure of (/3), 99 

Lagrange multipliers, 132 
Large psu’s, 205ff. 

Latin-square design, 262 
Limit, rapidity of approach to, 85 
Linear combination of random vari¬ 
ables, expected value of, 49 
variance and covariance of, 56, 57 
List, 1 

Listing units, 143, 165 

Madow, L. H., 279 
Madow, W. G., 76, 279 
Mahalanobis, P. C., 176, 281, 295, 296, 
325 

Markov inequality, 70 
. Marks, Eli S., 176, 280, 281, 325 


INDEX 


Mathematical expectation, 39 
definition of, 44 
theorems on, 46-69 

Mathematical model for response er¬ 
rors, 281 

Mauldin, W. Parker, 280, 281, 325 
McCreary, Garnet E., 176 
Mean square error, 51 
Measurable sampling plans, 33 
Measure of homogeneity, 157, 161, 222 
estimated from sample, 163 
extreme values of, 169 
for frequently occurring popula¬ 
tions, 168 

for ultimate clusters, 165 
intraclass correlation, 164 
relationships among measures of, 170 
Median, confidence limits for, 247 
Midzuno, Hiroshi, 204 
Moments, 81 
fourth (/i 4 ), 99 

divided by variance squared (j3), 
99 

Mood, A. M., 248 

Multi-phase sampling, see Double sam¬ 
pling 

Multi-stage sampling, 5, 205; see also 
Cluster sampling 
Mutually exclusive events, 28 

Neyman, J., 141, 279 
Nielsen, A, C., Co., 303 
Nisselson, H., 280, 281, 325 
Nonprobability sampling, 9 
Nonresponse, 257, 282 
Nonsampling errors, 280 
Normal limiting distribution, 76 
Notation, cluster sampling, 142, 205 
summation, II 
Null event, 27 

Number of possible samples, 31 

Operation, 15, 16 
conditional, 24 

Operations, number of possible results 
of, 36, 37 

Optimum allocation, double sampling, 
with regression estimates, 254 
with, stratification, 257 


329 

Optimum allocation, for call-backs on 
expensive units, 257 
for time series, 274 
for two occasions, 269-272 
minimizing total survey error, 298 
of a subsample for estimating vari¬ 
ance of a stratified sample, 239 
self-weighting three-stage stratified 
design, 223 

simple two-stage cluster sampling, 
172, 173 

stratified one- and two-stage cluster 
sampling, 192 

stratified simple random sampling, 
132, 135 

gains over proportionate sampling, 
134 

variance at optimum, 134 
stratified two- or more stage cluster 
sampling, 187, 188, 200, 223, 
232 

when uniform over-all sampling frac¬ 
tion is optimum, 234 
with complex cost function, 173, 188, 
223 

with optimum weights, 265 
with simple cost function, 135, 172, 
187, 192, 197, 200, 232, 255, 
259, 295, 301 

with varying probabilities, 197 
Optimum probabilities, 197, 200 
compared with stratification by size 
of psu, 200 

Optimum weights for estimating a ra¬ 
tio from a stratified sample, 265 
Order of convergence, 85 
Order statistics, confidence limits for, 
247 

Pairwise independent, 42 
Palmer, Gladys L., 325 
Patterson, H. D., 279 
Permutations, 20, 36, 37 
Physical properties of frequently oc¬ 
curring populations, 168 
Population, 1-6 

Population Sampling, A Chapter in, 
176, 204, 249 
Possible result, 15, 16 








330 


INDEX 


Precision, 7 

Primary sampling units, 5 
Pritzker, Leon, 281, 325 
Probabilities, optimum, 197 
Probability, 15 
conditional, 24, 59 
limits with Tchebycheff inequality, 
69 

of event, 16 
of selection, 16 

at ith drawing, 30, 91 
variable probabilities, 62, 194 
, proportionate to size, 62, 213 
selection by, 18 
theorems on, 15-38 
Probability sampling, 4, 9 
Probability selection methods, 16 
Product event, 22, 27 
Product of random variables, 54 
Proportion, rel-variance of estimated 
standard deviation, 105 
sample size needed to estimate stand¬ 
ard deviation of, 105 
variance of, 53 

Proportionate stratified sampling, 124 
biases of alternative ratio estimates, 
126 

compared with optimum allocation, 
134 

gain due to stratification, 130 

Random event, 42, 61, 65 
possible states, 42 
relation to random variable, 43 
Random group estimate of variance, 
240 

rel-variance of, 241 

Random grouping of elements into 
strata, variance between stratum 
means, 131 
Random numbers, 33 
Random sampling, with replacement, 
20, 41 

without replacement, 18, 41 
Random variable, 39, 40 
functions of, 44 
relation to random event, 43 
Ratio estimate, based on weighted av¬ 
erage of ratios of random vari¬ 
ables, 125 


Ratio estimate, bias, approximation to, 
112 

bias of, relative to standard devia¬ 
tion, 113 

compared with simple unbiased esti¬ 
mate, 119 

comparison of biases of alternative 
ratio estimates with stratified 
sampling, 126 

conditions when unbiased, 114 
rel-variance of estimate of variance, 
117 

some special ratio estimates, 225 
variance of, 107; see also Cluster 
sampling 

in terms of measure of homoge¬ 
neity, 161, 222 

when approximation is good, 109 
variance of alternative ratio esti¬ 
mates with stratified sampling, 
124, 125, 128 

Ratio of random variables, see Ratio 
estimate 

Regression coefficient, 253 
Regression estimates, 250, 254, 268 
Rel-variance, 51; see also Variance 
Replacement, sampling with, 20, 41 
sampling without, 18, 41 
Response errors, 280 

choice of design to control, 299 
compensating and uncorrelated, 305 
effect of, reflected in variance esti¬ 
mate, 293, 305 
individual, 285 

interviewer contribution to, 288, 294, 
295 

mathematical model for, 281 
response bias, 285 

role of, in determining survey design, 
280 

variance of estimated response bias, 
324 

Restricted sampling designs, Latin- 
square, 262; see also Cluster 
sampling; Stratified sampling; 
Systematic sampling 
Root mean square error, 51 

Sample, 4, 6 
Sample design, 7 



INDEX 


331 


Sampling error, see Variance 
Sampling fraction, 92, 144, 207 
Sampling plan, 4 
Sampling unit, 5 
Schumacher, F. X., 176 
Second-stage sampling units, 5 
Self-weighting sample, 197 
Sequence of estimates, 72 
Shrivastava, M. P., 296 
Simple one- or more stage cluster 
sampling, 142ff,; see also Cluster 
sampling 

Simple random sampling, 19, 21, 32, 
90ff. 

compared with stratified sampling, 
31, 130 

Simple unbiased estimate, 119, 144, 
207 

Size of sample needed to estimate 
standard deviation, 105 
Skalak, Blanche, 173 
Smith, B, Babington, 33 
Stage of sampling, 5, 142 
Standard deviation, 51; see also Vari¬ 
ance 

coefficient of variation of estimated, 

104 

confidence limits for estimated, 104 
rel-variance of estimated, 102 
size of sample needed to estimate, 

105 

Standard error, see Standard deviation 
Stephan, F. F., 139 
Stock, J. S., 141 

Stratification, after sampling, 138 
by size of psu compared with opti¬ 
mum probabilities, 200, 202 
effect of, with cluster sampling, 185, 
227 

when to equalize size of strata, 215 
Stratified sampling, 30, 12 Iff. 
bias with ratio estimate, 126 
cluster sampling, see Cluster sam- 
Pfing 

correction for bias in estimation of 
total variance, 138 
optimum allocation to strata; see 
Optimum allocation 
proportionate stratified sampling, 
124 


Stratified sampling, stratified simple 
random sampling, 30, 121 
compared with simple random 
sampling, 31, 130 
estimate and variance, 57, 121 
gain from optimum allocation, 134 
gain from proportionate sampling, 
130 

precision of estimated variance 
within strata, 239 
proportionate selection, 31 
variance with optimum allocation, 
134 

substratification with cluster sam¬ 
pling, 205, 206 

to control variation in size of psu, 
200 

with double sampling, 257 
Subsampling, 5, 32 
Subset, 4, 16, 28 
of a population, 106 
variance of an average or total for a 
subset, 114 

Substitution to accomplish weighting 
139 

Successive occasions, sampling on, 268, 
272 

Sukhatme, P. V., 141 
Sum event, 26-27 
evaluation of probability of, 28 
Sum of random variables, expected 
value of, 48 
variance of, 56 
Summation notation, 11 
Sums of powers, 77 
Survey design, 8 
Systematic sampling, 21 

Tchebycheff, 69 
Tepping, B. J., 173 
Thompson, D. J., 204 
Thompson, W. R,, 248 
Three- or more stage sampling, 182, 
205ff.; see also Cluster sam¬ 
pling 

Time series, sampling for, 268, 272 
Total variance, estimate of, 138, 151, 
157, 180, 216, 218, 291 
Totals, estimates of, 194; see also Ag¬ 
gregates 







332 


INDEX 


Travel cost, 188, 191 
Trend, estimate of, 269, 271, 274, 276 
True value, 8, 282 
Tschuprow, A. A., 141 
Two-stage sampling, 142; see also 
Cluster sampling 

Ultimate cluster, 156, 293 
estimate of variance, 156 
measure of homogeneity for, 165 
Unbiased estimate, 46 
Uncorrelated but dependent variables, 
55 

Uniform sampling fraction, 144, 207 

Variable probabilities, 194, 208, 213 
adjustment for changes in, 234 
optimum, 197 

Variance (also rel-variance and co- 
variance), 50 

comparison of and V^, 202 
components of, or contributions to, 
144; see also Cluster sampling 
conditional, 63 

contribution to mean square error, 
51 

estimates of, see Variance estimates 
for cluster sampling, see Cluster 
sampling 

for double sampling, 254, 257 
for simple random sampling, 92, 96 
for stratification after sampling, 138 
for stratified sampling, see Stratified 
sampling; Cluster sampling 
in terms of measures of homoge¬ 
neity, 161, 222 
of difference estimate, 250 
of estimated coefficient of variation, 
236 

of estimated rel-variance, 236 
of estimated standard deviation, 102 
of estimated variance, 99, 117 


Variance, of estimates for subsets, 114 
of linear combinations of random 
variables, 56 

of ratio estimates by subclasses, 225 
of ratios of random variables, 107; 
see also Ratio estimate; Strati¬ 
fied sampling; Cluster sampling 
of regression estimate, 251 
of simple unbiased estimate com¬ 
pared to ratio estimate, 119 
with optimum allocation, 134, 137 
with varying probabilities, 194, 208 
within and between strata, 129 
Variance estimates, for cluster sam¬ 
pling, see Cluster sampling 
for simple random sampling, 98, 120 
rel-variance of estimated standard 
deviation, 102 

rel-variance of estimated variance, 
99 

for stratified simple random sam¬ 
pling, 137 

from random groups, 240 
precision of, 99, 102, 105, 236, 237, 
240 

with known stratum means or totals, 
246 

Variation in size of cluster, control of, 
194 

Varying probabilities, 18, 62, 194, 213 

Weighting, by random substitution, 139 
optimum weights, 265 
Welch, Emmett H., 325 
Wilks, S, S., 248 

Within strata variance estimates, pre¬ 
cision of, 237 

Woodruff, Ralph S., 247, 249 
Yates, F., 176, 279 
Zacopanay, I., 176 


